diff mbox

[ARM] PR rtl-optimization/69904: Disallow copying/duplicating of load-exclusive operations

Message ID 56D80554.3060805@foss.arm.com
State New
Headers show

Commit Message

Kyrill Tkachov March 3, 2016, 9:35 a.m. UTC
Hi all,

In this PR shrink-wrapping ends up duplicating the load-exclusive part of a load-exclusive/store-exclusive loop
used to implement an atomic compare exchange operation. Look in bugzilla for the kind of sequences generated.
The load-exclusive paths are duplicated, but not the store-exclusive ones, with some very counter-intuitive
control flow between them.

The suggested solution is to catch the load exclusive instructions in TARGET_CANNOT_COPY_INSN_P so that
the RTL optimisers don't do any funny tricks with them. Since these instructions only really appear when
implementing atomic operations it's unlikely that the user will want particularly fancy transformations involving
duplication of exclusive access sequences anyway (though examples to the contrary would certainly be appreciated)
and this fix seems like the safest at this point. This is a regression fix from GCC 5 as shrink-wrapping got more
aggressive for GCC 6.

With this patch the sequence is not shrink-wrapped and is the same as the sequence for GCC 5 as shown in the PR.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Ok for trunk?

Thanks,
Kyrill


2016-03-03  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     PR rtl-optimization/69904
     * config/arm/arm.c (arm_cannot_copy_insn_p):
     Return true for load-exclusive instructions.

2016-03-03  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     PR rtl-optimization/69904
     * gcc.target/arm/pr69904.c: New test.

Comments

Ramana Radhakrishnan March 3, 2016, 9:42 a.m. UTC | #1
On 03/03/16 09:35, Kyrill Tkachov wrote:
> Hi all,
> 
> In this PR shrink-wrapping ends up duplicating the load-exclusive part of a load-exclusive/store-exclusive loop
> used to implement an atomic compare exchange operation. Look in bugzilla for the kind of sequences generated.
> The load-exclusive paths are duplicated, but not the store-exclusive ones, with some very counter-intuitive
> control flow between them.
> 
> The suggested solution is to catch the load exclusive instructions in TARGET_CANNOT_COPY_INSN_P so that
> the RTL optimisers don't do any funny tricks with them. Since these instructions only really appear when
> implementing atomic operations it's unlikely that the user will want particularly fancy transformations involving
> duplication of exclusive access sequences anyway (though examples to the contrary would certainly be appreciated)
> and this fix seems like the safest at this point. This is a regression fix from GCC 5 as shrink-wrapping got more
> aggressive for GCC 6.
> 
> With this patch the sequence is not shrink-wrapped and is the same as the sequence for GCC 5 as shown in the PR.
> 
> Bootstrapped and tested on arm-none-linux-gnueabihf.
> 
> Ok for trunk?

While reading the patch again I wondered if we should also catch store exclusives here but the cases we are interested in preventing are copies of blocks containing ldrex instructions which almost always are paired with an strex instruction i.e. I don't think an strex instruction can exist without a corresponding ldrex instruction. Thus your patch looks sufficient.


It would also be good to audit the aarch64 backend for any such sequences but that need not hold up this patch.

Thus, Ok for trunk.

Thanks,
Ramana

> 
> Thanks,
> Kyrill
> 
> 
> 2016-03-03  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> 
>     PR rtl-optimization/69904
>     * config/arm/arm.c (arm_cannot_copy_insn_p):
>     Return true for load-exclusive instructions.
> 
> 2016-03-03  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> 
>     PR rtl-optimization/69904
>     * gcc.target/arm/pr69904.c: New test.
Kyrill Tkachov March 3, 2016, 9:44 a.m. UTC | #2
On 03/03/16 09:42, Ramana Radhakrishnan wrote:
> On 03/03/16 09:35, Kyrill Tkachov wrote:
>> Hi all,
>>
>> In this PR shrink-wrapping ends up duplicating the load-exclusive part of a load-exclusive/store-exclusive loop
>> used to implement an atomic compare exchange operation. Look in bugzilla for the kind of sequences generated.
>> The load-exclusive paths are duplicated, but not the store-exclusive ones, with some very counter-intuitive
>> control flow between them.
>>
>> The suggested solution is to catch the load exclusive instructions in TARGET_CANNOT_COPY_INSN_P so that
>> the RTL optimisers don't do any funny tricks with them. Since these instructions only really appear when
>> implementing atomic operations it's unlikely that the user will want particularly fancy transformations involving
>> duplication of exclusive access sequences anyway (though examples to the contrary would certainly be appreciated)
>> and this fix seems like the safest at this point. This is a regression fix from GCC 5 as shrink-wrapping got more
>> aggressive for GCC 6.
>>
>> With this patch the sequence is not shrink-wrapped and is the same as the sequence for GCC 5 as shown in the PR.
>>
>> Bootstrapped and tested on arm-none-linux-gnueabihf.
>>
>> Ok for trunk?
> While reading the patch again I wondered if we should also catch store exclusives here but the cases we are interested in preventing are copies of blocks containing ldrex instructions which almost always are paired with an strex instruction i.e. I don't think an strex instruction can exist without a corresponding ldrex instruction. Thus your patch looks sufficient.

I had considered that too and came to the same conclusion.

>
> It would also be good to audit the aarch64 backend for any such sequences but that need not hold up this patch.
>
> Thus, Ok for trunk.

Thanks, I'll commit shortly.

Kyrill

> Thanks,
> Ramana
>
>> Thanks,
>> Kyrill
>>
>>
>> 2016-03-03  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>
>>      PR rtl-optimization/69904
>>      * config/arm/arm.c (arm_cannot_copy_insn_p):
>>      Return true for load-exclusive instructions.
>>
>> 2016-03-03  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>
>>      PR rtl-optimization/69904
>>      * gcc.target/arm/pr69904.c: New test.
diff mbox

Patch

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 10559625cd74c99b786009143a8543deb196462e..3d14ccfb3b2bfdaa1e9c0f58556d6dc5898bf142 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -13270,7 +13270,11 @@  tls_mentioned_p (rtx x)
     }
 }
 
-/* Must not copy any rtx that uses a pc-relative address.  */
+/* Must not copy any rtx that uses a pc-relative address.
+   Also, disallow copying of load-exclusive instructions that
+   may appear after splitting of compare-and-swap-style operations
+   so as to prevent those loops from being transformed away from their
+   canonical forms (see PR 69904).  */
 
 static bool
 arm_cannot_copy_insn_p (rtx_insn *insn)
@@ -13289,6 +13293,20 @@  arm_cannot_copy_insn_p (rtx_insn *insn)
 	      || XINT (x, 1) == UNSPEC_PIC_UNIFIED))
 	return true;
     }
+
+  rtx set = single_set (insn);
+  if (set)
+    {
+      rtx src = SET_SRC (set);
+      if (GET_CODE (src) == ZERO_EXTEND)
+	src = XEXP (src, 0);
+
+      /* Catch the load-exclusive and load-acquire operations.  */
+      if (GET_CODE (src) == UNSPEC_VOLATILE
+	  && (XINT (src, 1) == VUNSPEC_LL
+	      || XINT (src, 1) == VUNSPEC_LAX))
+	return true;
+    }
   return false;
 }
 
diff --git a/gcc/testsuite/gcc.target/arm/pr69904.c b/gcc/testsuite/gcc.target/arm/pr69904.c
new file mode 100644
index 0000000000000000000000000000000000000000..24fe844d58eba445ddbe3877e227d0086310192a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr69904.c
@@ -0,0 +1,24 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -marm" } */
+/* { dg-require-effective-target arm_arch_v7a_ok } */
+/* { dg-add-options arm_arch_v7a } */
+
+/* Make sure that RTL optimizers don't do any unexpected transformations
+   on the compare_exchange loop.  */
+
+#include <stdatomic.h>
+
+atomic_uint foo;
+atomic_uint bar;
+int glob;
+
+int
+main (void)
+{
+  glob = atomic_compare_exchange_strong (&foo, &bar, 0);
+  return glob;
+}
+
+/* { dg-final { scan-assembler-times "dmb\tish" 2 } } */
+/* { dg-final { scan-assembler-times "ldrex\t" 1 } } */
+/* { dg-final { scan-assembler-times "strex\t" 1 } } */