Message ID | ZfK3MQlg4stgtHTo@tucnak |
---|---|
State | New |
Headers | show |
Series | aarch64: Fix TImode __sync_*_compare_and_exchange expansion with LSE [PR114310] | expand |
On 14/03/2024 08:37, Jakub Jelinek wrote: > Hi! > > The following testcase ICEs with LSE atomics. > The problem is that the @atomic_compare_and_swap<mode> expander uses > aarch64_reg_or_zero predicate for the desired operand, which is fine, > given that for most of the modes and even for TImode in some cases > it can handle zero immediate just fine, but the TImode > @aarch64_compare_and_swap<mode>_lse just uses register_operand for > that operand instead, again intentionally so, because the casp, > caspa, caspl and caspal instructions need to use a pair of consecutive > registers for the operand and xzr is just one register and we can't > just store zero into the link register to emulate pair of zeros. > > So, the following patch fixes that by forcing the newval operand into > a register for the TImode LSE case. > > Bootstrapped/regtested on aarch64-linux, ok for trunk? An alternative fix would be to use a mode_attr to pick a different predicate for TImode. But that's probably just a matter of taste; I'm not sure that one would be better than the other in reality. OK (or with my suggestion if you prefer). R. > > 2024-03-14 Jakub Jelinek <jakub@redhat.com> > > PR target/114310 > * config/aarch64/aarch64.cc (aarch64_expand_compare_and_swap): For > TImode force newval into a register. > > * gcc.dg/pr114310.c: New test. > > --- gcc/config/aarch64/aarch64.cc.jj 2024-03-12 10:16:12.024101665 +0100 > +++ gcc/config/aarch64/aarch64.cc 2024-03-13 18:55:39.147986554 +0100 > @@ -24693,6 +24693,8 @@ aarch64_expand_compare_and_swap (rtx ope > rval = copy_to_mode_reg (r_mode, oldval); > else > emit_move_insn (rval, gen_lowpart (r_mode, oldval)); > + if (mode == TImode) > + newval = force_reg (mode, newval); > > emit_insn (gen_aarch64_compare_and_swap_lse (mode, rval, mem, > newval, mod_s)); > --- gcc/testsuite/gcc.dg/pr114310.c.jj 2024-03-13 19:09:25.322597418 +0100 > +++ gcc/testsuite/gcc.dg/pr114310.c 2024-03-13 19:08:50.802073314 +0100 > @@ -0,0 +1,20 @@ > +/* PR target/114310 */ > +/* { dg-do run { target int128 } } */ > + > +volatile __attribute__((aligned (sizeof (__int128_t)))) __int128_t v = 10; > + > +int > +main () > +{ > +#if __GCC_HAVE_SYNC_COMPARE_AND_SWAP_16 > + if (__sync_val_compare_and_swap (&v, (__int128_t) 10, (__int128_t) 0) != 10) > + __builtin_abort (); > + if (__sync_val_compare_and_swap (&v, (__int128_t) 10, (__int128_t) 15) != 0) > + __builtin_abort (); > + if (__sync_val_compare_and_swap (&v, (__int128_t) 0, (__int128_t) 42) != 0) > + __builtin_abort (); > + if (__sync_val_compare_and_swap (&v, (__int128_t) 31, (__int128_t) 35) != 42) > + __builtin_abort (); > +#endif > + return 0; > +} > > Jakub >
On Thu, Mar 14, 2024 at 11:49:14AM +0000, Richard Earnshaw (lists) wrote: > On 14/03/2024 08:37, Jakub Jelinek wrote: > > Hi! > > > > The following testcase ICEs with LSE atomics. > > The problem is that the @atomic_compare_and_swap<mode> expander uses > > aarch64_reg_or_zero predicate for the desired operand, which is fine, > > given that for most of the modes and even for TImode in some cases > > it can handle zero immediate just fine, but the TImode > > @aarch64_compare_and_swap<mode>_lse just uses register_operand for > > that operand instead, again intentionally so, because the casp, > > caspa, caspl and caspal instructions need to use a pair of consecutive > > registers for the operand and xzr is just one register and we can't > > just store zero into the link register to emulate pair of zeros. > > > > So, the following patch fixes that by forcing the newval operand into > > a register for the TImode LSE case. > > > > Bootstrapped/regtested on aarch64-linux, ok for trunk? > > An alternative fix would be to use a mode_attr to pick a different predicate > for TImode. But that's probably just a matter of taste; I'm not sure that > one would be better than the other in reality. The reason I didn't do something like that is that for the non-LSE case, @aarch64_compare_and_swap<mode> pattern actually handles const0_rtx desired operand and so does the TARGET_OUTLINE_ATOMICS case, so if it was done through a predicate, it would need to be a predicate that would be specific to this case and would yield register_operand for TImode && TARGET_LSE and and otherwise aarch64_reg_or_zero. The patch seems to be simpler to me, especially when aarch64_expand_compare_and_swap already does the forcing stuff into registers at various other spots. Jakub
--- gcc/config/aarch64/aarch64.cc.jj 2024-03-12 10:16:12.024101665 +0100 +++ gcc/config/aarch64/aarch64.cc 2024-03-13 18:55:39.147986554 +0100 @@ -24693,6 +24693,8 @@ aarch64_expand_compare_and_swap (rtx ope rval = copy_to_mode_reg (r_mode, oldval); else emit_move_insn (rval, gen_lowpart (r_mode, oldval)); + if (mode == TImode) + newval = force_reg (mode, newval); emit_insn (gen_aarch64_compare_and_swap_lse (mode, rval, mem, newval, mod_s)); --- gcc/testsuite/gcc.dg/pr114310.c.jj 2024-03-13 19:09:25.322597418 +0100 +++ gcc/testsuite/gcc.dg/pr114310.c 2024-03-13 19:08:50.802073314 +0100 @@ -0,0 +1,20 @@ +/* PR target/114310 */ +/* { dg-do run { target int128 } } */ + +volatile __attribute__((aligned (sizeof (__int128_t)))) __int128_t v = 10; + +int +main () +{ +#if __GCC_HAVE_SYNC_COMPARE_AND_SWAP_16 + if (__sync_val_compare_and_swap (&v, (__int128_t) 10, (__int128_t) 0) != 10) + __builtin_abort (); + if (__sync_val_compare_and_swap (&v, (__int128_t) 10, (__int128_t) 15) != 0) + __builtin_abort (); + if (__sync_val_compare_and_swap (&v, (__int128_t) 0, (__int128_t) 42) != 0) + __builtin_abort (); + if (__sync_val_compare_and_swap (&v, (__int128_t) 31, (__int128_t) 35) != 42) + __builtin_abort (); +#endif + return 0; +}