diff mbox series

aarch64: Fix TImode __sync_*_compare_and_exchange expansion with LSE [PR114310]

Message ID ZfK3MQlg4stgtHTo@tucnak
State New
Headers show
Series aarch64: Fix TImode __sync_*_compare_and_exchange expansion with LSE [PR114310] | expand

Commit Message

Jakub Jelinek March 14, 2024, 8:37 a.m. UTC
Hi!

The following testcase ICEs with LSE atomics.
The problem is that the @atomic_compare_and_swap<mode> expander uses
aarch64_reg_or_zero predicate for the desired operand, which is fine,
given that for most of the modes and even for TImode in some cases
it can handle zero immediate just fine, but the TImode
@aarch64_compare_and_swap<mode>_lse just uses register_operand for
that operand instead, again intentionally so, because the casp,
caspa, caspl and caspal instructions need to use a pair of consecutive
registers for the operand and xzr is just one register and we can't
just store zero into the link register to emulate pair of zeros.

So, the following patch fixes that by forcing the newval operand into
a register for the TImode LSE case.

Bootstrapped/regtested on aarch64-linux, ok for trunk?

2024-03-14  Jakub Jelinek  <jakub@redhat.com>

	PR target/114310
	* config/aarch64/aarch64.cc (aarch64_expand_compare_and_swap): For
	TImode force newval into a register.

	* gcc.dg/pr114310.c: New test.


	Jakub

Comments

Richard Earnshaw (lists) March 14, 2024, 11:49 a.m. UTC | #1
On 14/03/2024 08:37, Jakub Jelinek wrote:
> Hi!
> 
> The following testcase ICEs with LSE atomics.
> The problem is that the @atomic_compare_and_swap<mode> expander uses
> aarch64_reg_or_zero predicate for the desired operand, which is fine,
> given that for most of the modes and even for TImode in some cases
> it can handle zero immediate just fine, but the TImode
> @aarch64_compare_and_swap<mode>_lse just uses register_operand for
> that operand instead, again intentionally so, because the casp,
> caspa, caspl and caspal instructions need to use a pair of consecutive
> registers for the operand and xzr is just one register and we can't
> just store zero into the link register to emulate pair of zeros.
> 
> So, the following patch fixes that by forcing the newval operand into
> a register for the TImode LSE case.
> 
> Bootstrapped/regtested on aarch64-linux, ok for trunk?

An alternative fix would be to use a mode_attr to pick a different 
predicate for TImode.  But that's probably just a matter of taste; I'm 
not sure that one would be better than the other in reality.

OK (or with my suggestion if you prefer).

R.

> 
> 2024-03-14  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR target/114310
> 	* config/aarch64/aarch64.cc (aarch64_expand_compare_and_swap): For
> 	TImode force newval into a register.
> 
> 	* gcc.dg/pr114310.c: New test.
> 
> --- gcc/config/aarch64/aarch64.cc.jj	2024-03-12 10:16:12.024101665 +0100
> +++ gcc/config/aarch64/aarch64.cc	2024-03-13 18:55:39.147986554 +0100
> @@ -24693,6 +24693,8 @@ aarch64_expand_compare_and_swap (rtx ope
>           rval = copy_to_mode_reg (r_mode, oldval);
>         else
>   	emit_move_insn (rval, gen_lowpart (r_mode, oldval));
> +      if (mode == TImode)
> +	newval = force_reg (mode, newval);
>   
>         emit_insn (gen_aarch64_compare_and_swap_lse (mode, rval, mem,
>   						   newval, mod_s));
> --- gcc/testsuite/gcc.dg/pr114310.c.jj	2024-03-13 19:09:25.322597418 +0100
> +++ gcc/testsuite/gcc.dg/pr114310.c	2024-03-13 19:08:50.802073314 +0100
> @@ -0,0 +1,20 @@
> +/* PR target/114310 */
> +/* { dg-do run { target int128 } } */
> +
> +volatile __attribute__((aligned (sizeof (__int128_t)))) __int128_t v = 10;
> +
> +int
> +main ()
> +{
> +#if __GCC_HAVE_SYNC_COMPARE_AND_SWAP_16
> +  if (__sync_val_compare_and_swap (&v, (__int128_t) 10, (__int128_t) 0) != 10)
> +    __builtin_abort ();
> +  if (__sync_val_compare_and_swap (&v, (__int128_t) 10, (__int128_t) 15) != 0)
> +    __builtin_abort ();
> +  if (__sync_val_compare_and_swap (&v, (__int128_t) 0, (__int128_t) 42) != 0)
> +    __builtin_abort ();
> +  if (__sync_val_compare_and_swap (&v, (__int128_t) 31, (__int128_t) 35) != 42)
> +    __builtin_abort ();
> +#endif
> +  return 0;
> +}
> 
> 	Jakub
>
Jakub Jelinek March 14, 2024, 1:02 p.m. UTC | #2
On Thu, Mar 14, 2024 at 11:49:14AM +0000, Richard Earnshaw (lists) wrote:
> On 14/03/2024 08:37, Jakub Jelinek wrote:
> > Hi!
> > 
> > The following testcase ICEs with LSE atomics.
> > The problem is that the @atomic_compare_and_swap<mode> expander uses
> > aarch64_reg_or_zero predicate for the desired operand, which is fine,
> > given that for most of the modes and even for TImode in some cases
> > it can handle zero immediate just fine, but the TImode
> > @aarch64_compare_and_swap<mode>_lse just uses register_operand for
> > that operand instead, again intentionally so, because the casp,
> > caspa, caspl and caspal instructions need to use a pair of consecutive
> > registers for the operand and xzr is just one register and we can't
> > just store zero into the link register to emulate pair of zeros.
> > 
> > So, the following patch fixes that by forcing the newval operand into
> > a register for the TImode LSE case.
> > 
> > Bootstrapped/regtested on aarch64-linux, ok for trunk?
> 
> An alternative fix would be to use a mode_attr to pick a different predicate
> for TImode.  But that's probably just a matter of taste; I'm not sure that
> one would be better than the other in reality.

The reason I didn't do something like that is that for the non-LSE case,
@aarch64_compare_and_swap<mode> pattern actually handles const0_rtx desired
operand and so does the TARGET_OUTLINE_ATOMICS case, so if it was done through
a predicate, it would need to be a predicate that would be specific to this
case and would yield register_operand for TImode && TARGET_LSE and and
otherwise aarch64_reg_or_zero.  The patch seems to be simpler to me,
especially when aarch64_expand_compare_and_swap already does the forcing
stuff into registers at various other spots.

	Jakub
diff mbox series

Patch

--- gcc/config/aarch64/aarch64.cc.jj	2024-03-12 10:16:12.024101665 +0100
+++ gcc/config/aarch64/aarch64.cc	2024-03-13 18:55:39.147986554 +0100
@@ -24693,6 +24693,8 @@  aarch64_expand_compare_and_swap (rtx ope
         rval = copy_to_mode_reg (r_mode, oldval);
       else
 	emit_move_insn (rval, gen_lowpart (r_mode, oldval));
+      if (mode == TImode)
+	newval = force_reg (mode, newval);
 
       emit_insn (gen_aarch64_compare_and_swap_lse (mode, rval, mem,
 						   newval, mod_s));
--- gcc/testsuite/gcc.dg/pr114310.c.jj	2024-03-13 19:09:25.322597418 +0100
+++ gcc/testsuite/gcc.dg/pr114310.c	2024-03-13 19:08:50.802073314 +0100
@@ -0,0 +1,20 @@ 
+/* PR target/114310 */
+/* { dg-do run { target int128 } } */
+
+volatile __attribute__((aligned (sizeof (__int128_t)))) __int128_t v = 10;
+
+int
+main ()
+{
+#if __GCC_HAVE_SYNC_COMPARE_AND_SWAP_16
+  if (__sync_val_compare_and_swap (&v, (__int128_t) 10, (__int128_t) 0) != 10)
+    __builtin_abort ();
+  if (__sync_val_compare_and_swap (&v, (__int128_t) 10, (__int128_t) 15) != 0)
+    __builtin_abort ();
+  if (__sync_val_compare_and_swap (&v, (__int128_t) 0, (__int128_t) 42) != 0)
+    __builtin_abort ();
+  if (__sync_val_compare_and_swap (&v, (__int128_t) 31, (__int128_t) 35) != 42)
+    __builtin_abort ();
+#endif
+  return 0;
+}