Message ID | 20210415143939.4ss6j36lyk6j6f7n@arm.com |
---|---|
State | New |
Headers | show |
Series | arm: Fix ICEs with compare-and-swap and -march=armv8-m.base [PR99977] | expand |
Ping On 15/04/2021 15:39, Alex Coplan via Gcc-patches wrote: > Hi all, > > The PR shows two ICEs with __sync_bool_compare_and_swap and > -mcpu=cortex-m23 (equivalently, -march=armv8-m.base): one in LRA and one > later on, after the CAS insn is split. > > The LRA ICE occurs because the > @atomic_compare_and_swap<CCSI:arch><SIDI:mode>_1 pattern attempts to tie > two output operands together (operands 0 and 1 in the third > alternative). LRA can't handle this, since it doesn't make sense for an > insn to assign to the same operand twice. > > The later (post-splitting) ICE occurs because the expansion of the > cbranchsi4_scratch insn doesn't quite go according to plan. As it > stands, arm_split_compare_and_swap calls gen_cbranchsi4_scratch, > attempting to pass a register (neg_bval) to use as a scratch register. > However, since the RTL template has a match_scratch here, > gen_cbranchsi4_scratch ignores this argument and produces a scratch rtx. > Since this is all happening after RA, this is doomed to fail (and we get > an ICE about the insn not matching its constraints). > > It seems that the motivation for the choice of constraints in the > atomic_compare_and_swap pattern comes from an attempt to satisfy the > constraints of the cbranchsi4_scratch insn. This insn requires the > scratch register to be the same as the input register in the case that > we use a larger negative immediate (one that satisfies J, but not L). > > Of course, as noted above, LRA refuses to assign two output operands to > the same register, so this was never going to work. > > The solution I'm proposing here is to collapse the alternatives to the > CAS insn (allowing the two output register operands to be matched to > different registers) and to ensure that the constraints for > cbranchsi4_scratch are met in arm_split_compare_and_swap. We do this by > inserting a move to ensure the source and destination registers match if > necessary (i.e. in the case of large negative immediates). > > Another notable change here is that we only do: > > emit_move_insn (neg_bval, const1_rtx); > > for non-negative immediates. This is because the ADDS instruction used in > the negative case suffices to leave a suitable value in neg_bval: if the > operands compare equal, we don't take the branch (so neg_bval will be > set by the load exclusive). Otherwise, the ADDS will leave a nonzero > value in neg_bval, which will correctly signal that the CAS has failed > when it is later negated. > > Testing: > * Bootstrapped and regtested on arm-linux-gnueabihf, no regressions. > * Regtested an arm-eabi cross configured with --with-arch=armv8-m.base, no > regressions. The patch fixes the gcc.dg/ia64-sync-3.c test in this config. > > OK for trunk? > > Thanks, > Alex > > gcc/ChangeLog: > > PR target/99977 > * config/arm/arm.c (arm_split_compare_and_swap): Fix up codegen > with negative immediates: ensure we expand cbranchsi4_scratch > correctly and ensure we satisfy its constraints. > * config/arm/sync.md > (@atomic_compare_and_swap<CCSI:arch><NARROW:mode>_1): Don't > attempt to tie two output operands together with constraints; > collapse two alternatives. > (@atomic_compare_and_swap<CCSI:arch><SIDI:mode>_1): Likewise. > * config/arm/thumb1.md (cbranchsi4_neg_late): New. > > gcc/testsuite/ChangeLog: > > PR target/99977 > * gcc.target/arm/pr99977.c: New test. > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c > index 475fb0d827f..8d19b8a73fd 100644 > --- a/gcc/config/arm/arm.c > +++ b/gcc/config/arm/arm.c > @@ -30737,13 +30737,31 @@ arm_split_compare_and_swap (rtx operands[]) > } > else > { > - emit_move_insn (neg_bval, const1_rtx); > cond = gen_rtx_NE (VOIDmode, rval, oldval); > if (thumb1_cmpneg_operand (oldval, SImode)) > - emit_unlikely_jump (gen_cbranchsi4_scratch (neg_bval, rval, oldval, > - label2, cond)); > + { > + rtx src = rval; > + if (!satisfies_constraint_L (oldval)) > + { > + gcc_assert (satisfies_constraint_J (oldval)); > + > + /* For such immediates, ADDS needs the source and destination regs > + to be the same. > + > + Normally this would be handled by RA, but this is all happening > + after RA. */ > + emit_move_insn (neg_bval, rval); > + src = neg_bval; > + } > + > + emit_unlikely_jump (gen_cbranchsi4_neg_late (neg_bval, src, oldval, > + label2, cond)); > + } > else > - emit_unlikely_jump (gen_cbranchsi4_insn (cond, rval, oldval, label2)); > + { > + emit_move_insn (neg_bval, const1_rtx); > + emit_unlikely_jump (gen_cbranchsi4_insn (cond, rval, oldval, label2)); > + } > } > > arm_emit_store_exclusive (mode, neg_bval, mem, newval, use_release); > diff --git a/gcc/config/arm/sync.md b/gcc/config/arm/sync.md > index e4682c039b9..b9fa8702606 100644 > --- a/gcc/config/arm/sync.md > +++ b/gcc/config/arm/sync.md > @@ -187,20 +187,20 @@ > ;; Constraints of this pattern must be at least as strict as those of the > ;; cbranchsi operations in thumb1.md and aim to be as permissive. > (define_insn_and_split "@atomic_compare_and_swap<CCSI:arch><NARROW:mode>_1" > - [(set (match_operand:CCSI 0 "cc_register_operand" "=&c,&l,&l,&l") ;; bool out > + [(set (match_operand:CCSI 0 "cc_register_operand" "=&c,&l,&l") ;; bool out > (unspec_volatile:CCSI [(const_int 0)] VUNSPEC_ATOMIC_CAS)) > - (set (match_operand:SI 1 "s_register_operand" "=&r,&l,&0,&l*h") ;; val out > + (set (match_operand:SI 1 "s_register_operand" "=&r,&l,&l*h") ;; val out > (zero_extend:SI > - (match_operand:NARROW 2 "mem_noofs_operand" "+Ua,Ua,Ua,Ua"))) ;; memory > + (match_operand:NARROW 2 "mem_noofs_operand" "+Ua,Ua,Ua"))) ;; memory > (set (match_dup 2) > (unspec_volatile:NARROW > - [(match_operand:SI 3 "arm_add_operand" "rIL,lIL*h,J,*r") ;; expected > - (match_operand:NARROW 4 "s_register_operand" "r,r,r,r") ;; desired > + [(match_operand:SI 3 "arm_add_operand" "rIL,lILJ*h,*r") ;; expected > + (match_operand:NARROW 4 "s_register_operand" "r,r,r") ;; desired > (match_operand:SI 5 "const_int_operand") ;; is_weak > (match_operand:SI 6 "const_int_operand") ;; mod_s > (match_operand:SI 7 "const_int_operand")] ;; mod_f > VUNSPEC_ATOMIC_CAS)) > - (clobber (match_scratch:SI 8 "=&r,X,X,X"))] > + (clobber (match_scratch:SI 8 "=&r,X,X"))] > "<NARROW:sync_predtab>" > "#" > "&& reload_completed" > @@ -209,7 +209,7 @@ > arm_split_compare_and_swap (operands); > DONE; > } > - [(set_attr "arch" "32,v8mb,v8mb,v8mb")]) > + [(set_attr "arch" "32,v8mb,v8mb")]) > > (define_mode_attr cas_cmp_operand > [(SI "arm_add_operand") (DI "cmpdi_operand")]) > @@ -219,19 +219,19 @@ > ;; Constraints of this pattern must be at least as strict as those of the > ;; cbranchsi operations in thumb1.md and aim to be as permissive. > (define_insn_and_split "@atomic_compare_and_swap<CCSI:arch><SIDI:mode>_1" > - [(set (match_operand:CCSI 0 "cc_register_operand" "=&c,&l,&l,&l") ;; bool out > + [(set (match_operand:CCSI 0 "cc_register_operand" "=&c,&l,&l") ;; bool out > (unspec_volatile:CCSI [(const_int 0)] VUNSPEC_ATOMIC_CAS)) > - (set (match_operand:SIDI 1 "s_register_operand" "=&r,&l,&0,&l*h") ;; val out > - (match_operand:SIDI 2 "mem_noofs_operand" "+Ua,Ua,Ua,Ua")) ;; memory > + (set (match_operand:SIDI 1 "s_register_operand" "=&r,&l,&l*h") ;; val out > + (match_operand:SIDI 2 "mem_noofs_operand" "+Ua,Ua,Ua")) ;; memory > (set (match_dup 2) > (unspec_volatile:SIDI > - [(match_operand:SIDI 3 "<SIDI:cas_cmp_operand>" "<SIDI:cas_cmp_str>,lIL*h,J,*r") ;; expect > - (match_operand:SIDI 4 "s_register_operand" "r,r,r,r") ;; desired > + [(match_operand:SIDI 3 "<SIDI:cas_cmp_operand>" "<SIDI:cas_cmp_str>,lILJ*h,*r") ;; expect > + (match_operand:SIDI 4 "s_register_operand" "r,r,r") ;; desired > (match_operand:SI 5 "const_int_operand") ;; is_weak > (match_operand:SI 6 "const_int_operand") ;; mod_s > (match_operand:SI 7 "const_int_operand")] ;; mod_f > VUNSPEC_ATOMIC_CAS)) > - (clobber (match_scratch:SI 8 "=&r,X,X,X"))] > + (clobber (match_scratch:SI 8 "=&r,X,X"))] > "<SIDI:sync_predtab>" > "#" > "&& reload_completed" > @@ -240,7 +240,7 @@ > arm_split_compare_and_swap (operands); > DONE; > } > - [(set_attr "arch" "32,v8mb,v8mb,v8mb")]) > + [(set_attr "arch" "32,v8mb,v8mb")]) > > (define_insn_and_split "atomic_exchange<mode>" > [(set (match_operand:QHSD 0 "s_register_operand" "=&r,&r") ;; output > diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md > index c98b59c4fa7..084ed6597e0 100644 > --- a/gcc/config/arm/thumb1.md > +++ b/gcc/config/arm/thumb1.md > @@ -1206,6 +1206,21 @@ > (set_attr "type" "multiple")] > ) > > +;; An expander which makes use of the cbranchsi4_scratch insn, but can > +;; be used safely after RA. > +(define_expand "cbranchsi4_neg_late" > + [(parallel [ > + (set (pc) (if_then_else > + (match_operator 4 "arm_comparison_operator" > + [(match_operand:SI 1 "s_register_operand") > + (match_operand:SI 2 "thumb1_cmpneg_operand")]) > + (label_ref (match_operand 3 "" "")) > + (pc))) > + (clobber (match_operand:SI 0 "s_register_operand")) > + ])] > + "TARGET_THUMB1" > +) > + > ;; Changes to the constraints of this pattern must be propagated to those of > ;; atomic compare_and_swap splitters in sync.md. These must be at least as > ;; strict as the constraints here and aim to be as permissive. > diff --git a/gcc/testsuite/gcc.target/arm/pr99977.c b/gcc/testsuite/gcc.target/arm/pr99977.c > new file mode 100644 > index 00000000000..7911899d928 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/arm/pr99977.c > @@ -0,0 +1,6 @@ > +/* { dg-do compile } */ > +/* { dg-options "-march=armv8-m.base -mfloat-abi=soft -O2" } */ > +_Bool f1(int *p) { return __sync_bool_compare_and_swap (p, -1, 2); } > +_Bool f2(int *p) { return __sync_bool_compare_and_swap (p, -8, 2); } > +int g1(int *p) { return __sync_val_compare_and_swap (p, -1, 2); } > +int g2(int *p) { return __sync_val_compare_and_swap (p, -8, 3); }
Hi Alex, > -----Original Message----- > From: Alex Coplan <Alex.Coplan@arm.com> > Sent: 27 April 2021 14:14 > To: gcc-patches@gcc.gnu.org > Cc: nickc@redhat.com; Richard Earnshaw <Richard.Earnshaw@arm.com>; > Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com>; Kyrylo > Tkachov <Kyrylo.Tkachov@arm.com> > Subject: Re: [PATCH] arm: Fix ICEs with compare-and-swap and - > march=armv8-m.base [PR99977] > > Ping > > On 15/04/2021 15:39, Alex Coplan via Gcc-patches wrote: > > Hi all, > > > > The PR shows two ICEs with __sync_bool_compare_and_swap and > > -mcpu=cortex-m23 (equivalently, -march=armv8-m.base): one in LRA and > one > > later on, after the CAS insn is split. > > > > The LRA ICE occurs because the > > @atomic_compare_and_swap<CCSI:arch><SIDI:mode>_1 pattern > attempts to tie > > two output operands together (operands 0 and 1 in the third > > alternative). LRA can't handle this, since it doesn't make sense for an > > insn to assign to the same operand twice. > > > > The later (post-splitting) ICE occurs because the expansion of the > > cbranchsi4_scratch insn doesn't quite go according to plan. As it > > stands, arm_split_compare_and_swap calls gen_cbranchsi4_scratch, > > attempting to pass a register (neg_bval) to use as a scratch register. > > However, since the RTL template has a match_scratch here, > > gen_cbranchsi4_scratch ignores this argument and produces a scratch rtx. > > Since this is all happening after RA, this is doomed to fail (and we get > > an ICE about the insn not matching its constraints). > > > > It seems that the motivation for the choice of constraints in the > > atomic_compare_and_swap pattern comes from an attempt to satisfy the > > constraints of the cbranchsi4_scratch insn. This insn requires the > > scratch register to be the same as the input register in the case that > > we use a larger negative immediate (one that satisfies J, but not L). > > > > Of course, as noted above, LRA refuses to assign two output operands to > > the same register, so this was never going to work. > > > > The solution I'm proposing here is to collapse the alternatives to the > > CAS insn (allowing the two output register operands to be matched to > > different registers) and to ensure that the constraints for > > cbranchsi4_scratch are met in arm_split_compare_and_swap. We do this > by > > inserting a move to ensure the source and destination registers match if > > necessary (i.e. in the case of large negative immediates). > > > > Another notable change here is that we only do: > > > > emit_move_insn (neg_bval, const1_rtx); > > > > for non-negative immediates. This is because the ADDS instruction used in > > the negative case suffices to leave a suitable value in neg_bval: if the > > operands compare equal, we don't take the branch (so neg_bval will be > > set by the load exclusive). Otherwise, the ADDS will leave a nonzero > > value in neg_bval, which will correctly signal that the CAS has failed > > when it is later negated. > > > > Testing: > > * Bootstrapped and regtested on arm-linux-gnueabihf, no regressions. > > * Regtested an arm-eabi cross configured with --with-arch=armv8-m.base, > no > > regressions. The patch fixes the gcc.dg/ia64-sync-3.c test in this config. > > > > OK for trunk? Ok. Thanks, Kyrill > > > > Thanks, > > Alex > > > > gcc/ChangeLog: > > > > PR target/99977 > > * config/arm/arm.c (arm_split_compare_and_swap): Fix up codegen > > with negative immediates: ensure we expand cbranchsi4_scratch > > correctly and ensure we satisfy its constraints. > > * config/arm/sync.md > > (@atomic_compare_and_swap<CCSI:arch><NARROW:mode>_1): > Don't > > attempt to tie two output operands together with constraints; > > collapse two alternatives. > > (@atomic_compare_and_swap<CCSI:arch><SIDI:mode>_1): Likewise. > > * config/arm/thumb1.md (cbranchsi4_neg_late): New. > > > > gcc/testsuite/ChangeLog: > > > > PR target/99977 > > * gcc.target/arm/pr99977.c: New test. > > > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c > > index 475fb0d827f..8d19b8a73fd 100644 > > --- a/gcc/config/arm/arm.c > > +++ b/gcc/config/arm/arm.c > > @@ -30737,13 +30737,31 @@ arm_split_compare_and_swap (rtx > operands[]) > > } > > else > > { > > - emit_move_insn (neg_bval, const1_rtx); > > cond = gen_rtx_NE (VOIDmode, rval, oldval); > > if (thumb1_cmpneg_operand (oldval, SImode)) > > - emit_unlikely_jump (gen_cbranchsi4_scratch (neg_bval, rval, oldval, > > - label2, cond)); > > + { > > + rtx src = rval; > > + if (!satisfies_constraint_L (oldval)) > > + { > > + gcc_assert (satisfies_constraint_J (oldval)); > > + > > + /* For such immediates, ADDS needs the source and destination > regs > > + to be the same. > > + > > + Normally this would be handled by RA, but this is all > happening > > + after RA. */ > > + emit_move_insn (neg_bval, rval); > > + src = neg_bval; > > + } > > + > > + emit_unlikely_jump (gen_cbranchsi4_neg_late (neg_bval, src, oldval, > > + label2, cond)); > > + } > > else > > - emit_unlikely_jump (gen_cbranchsi4_insn (cond, rval, oldval, label2)); > > + { > > + emit_move_insn (neg_bval, const1_rtx); > > + emit_unlikely_jump (gen_cbranchsi4_insn (cond, rval, oldval, > label2)); > > + } > > } > > > > arm_emit_store_exclusive (mode, neg_bval, mem, newval, use_release); > > diff --git a/gcc/config/arm/sync.md b/gcc/config/arm/sync.md > > index e4682c039b9..b9fa8702606 100644 > > --- a/gcc/config/arm/sync.md > > +++ b/gcc/config/arm/sync.md > > @@ -187,20 +187,20 @@ > > ;; Constraints of this pattern must be at least as strict as those of the > > ;; cbranchsi operations in thumb1.md and aim to be as permissive. > > (define_insn_and_split > "@atomic_compare_and_swap<CCSI:arch><NARROW:mode>_1" > > - [(set (match_operand:CCSI 0 "cc_register_operand" "=&c,&l,&l,&l") ;; > bool out > > + [(set (match_operand:CCSI 0 "cc_register_operand" "=&c,&l,&l") ;; > bool out > > (unspec_volatile:CCSI [(const_int 0)] VUNSPEC_ATOMIC_CAS)) > > - (set (match_operand:SI 1 "s_register_operand" "=&r,&l,&0,&l*h") ;; val > out > > + (set (match_operand:SI 1 "s_register_operand" "=&r,&l,&l*h") ;; val > out > > (zero_extend:SI > > - (match_operand:NARROW 2 "mem_noofs_operand" > "+Ua,Ua,Ua,Ua"))) ;; memory > > + (match_operand:NARROW 2 "mem_noofs_operand" "+Ua,Ua,Ua"))) > ;; memory > > (set (match_dup 2) > > (unspec_volatile:NARROW > > - [(match_operand:SI 3 "arm_add_operand" "rIL,lIL*h,J,*r") ;; > expected > > - (match_operand:NARROW 4 "s_register_operand" "r,r,r,r") ;; > desired > > + [(match_operand:SI 3 "arm_add_operand" "rIL,lILJ*h,*r") ;; > expected > > + (match_operand:NARROW 4 "s_register_operand" "r,r,r") ;; > desired > > (match_operand:SI 5 "const_int_operand") ;; is_weak > > (match_operand:SI 6 "const_int_operand") ;; mod_s > > (match_operand:SI 7 "const_int_operand")] ;; mod_f > > VUNSPEC_ATOMIC_CAS)) > > - (clobber (match_scratch:SI 8 "=&r,X,X,X"))] > > + (clobber (match_scratch:SI 8 "=&r,X,X"))] > > "<NARROW:sync_predtab>" > > "#" > > "&& reload_completed" > > @@ -209,7 +209,7 @@ > > arm_split_compare_and_swap (operands); > > DONE; > > } > > - [(set_attr "arch" "32,v8mb,v8mb,v8mb")]) > > + [(set_attr "arch" "32,v8mb,v8mb")]) > > > > (define_mode_attr cas_cmp_operand > > [(SI "arm_add_operand") (DI "cmpdi_operand")]) > > @@ -219,19 +219,19 @@ > > ;; Constraints of this pattern must be at least as strict as those of the > > ;; cbranchsi operations in thumb1.md and aim to be as permissive. > > (define_insn_and_split > "@atomic_compare_and_swap<CCSI:arch><SIDI:mode>_1" > > - [(set (match_operand:CCSI 0 "cc_register_operand" "=&c,&l,&l,&l") ;; > bool out > > + [(set (match_operand:CCSI 0 "cc_register_operand" "=&c,&l,&l") ;; > bool out > > (unspec_volatile:CCSI [(const_int 0)] VUNSPEC_ATOMIC_CAS)) > > - (set (match_operand:SIDI 1 "s_register_operand" "=&r,&l,&0,&l*h") > ;; val out > > - (match_operand:SIDI 2 "mem_noofs_operand" "+Ua,Ua,Ua,Ua")) > ;; memory > > + (set (match_operand:SIDI 1 "s_register_operand" "=&r,&l,&l*h") ;; val > out > > + (match_operand:SIDI 2 "mem_noofs_operand" "+Ua,Ua,Ua")) ;; > memory > > (set (match_dup 2) > > (unspec_volatile:SIDI > > - [(match_operand:SIDI 3 "<SIDI:cas_cmp_operand>" > "<SIDI:cas_cmp_str>,lIL*h,J,*r") ;; expect > > - (match_operand:SIDI 4 "s_register_operand" "r,r,r,r") ;; desired > > + [(match_operand:SIDI 3 "<SIDI:cas_cmp_operand>" > "<SIDI:cas_cmp_str>,lILJ*h,*r") ;; expect > > + (match_operand:SIDI 4 "s_register_operand" "r,r,r") ;; desired > > (match_operand:SI 5 "const_int_operand") ;; is_weak > > (match_operand:SI 6 "const_int_operand") ;; mod_s > > (match_operand:SI 7 "const_int_operand")] ;; mod_f > > VUNSPEC_ATOMIC_CAS)) > > - (clobber (match_scratch:SI 8 "=&r,X,X,X"))] > > + (clobber (match_scratch:SI 8 "=&r,X,X"))] > > "<SIDI:sync_predtab>" > > "#" > > "&& reload_completed" > > @@ -240,7 +240,7 @@ > > arm_split_compare_and_swap (operands); > > DONE; > > } > > - [(set_attr "arch" "32,v8mb,v8mb,v8mb")]) > > + [(set_attr "arch" "32,v8mb,v8mb")]) > > > > (define_insn_and_split "atomic_exchange<mode>" > > [(set (match_operand:QHSD 0 "s_register_operand" "=&r,&r") ;; > output > > diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md > > index c98b59c4fa7..084ed6597e0 100644 > > --- a/gcc/config/arm/thumb1.md > > +++ b/gcc/config/arm/thumb1.md > > @@ -1206,6 +1206,21 @@ > > (set_attr "type" "multiple")] > > ) > > > > +;; An expander which makes use of the cbranchsi4_scratch insn, but can > > +;; be used safely after RA. > > +(define_expand "cbranchsi4_neg_late" > > + [(parallel [ > > + (set (pc) (if_then_else > > + (match_operator 4 "arm_comparison_operator" > > + [(match_operand:SI 1 "s_register_operand") > > + (match_operand:SI 2 "thumb1_cmpneg_operand")]) > > + (label_ref (match_operand 3 "" "")) > > + (pc))) > > + (clobber (match_operand:SI 0 "s_register_operand")) > > + ])] > > + "TARGET_THUMB1" > > +) > > + > > ;; Changes to the constraints of this pattern must be propagated to those > of > > ;; atomic compare_and_swap splitters in sync.md. These must be at least > as > > ;; strict as the constraints here and aim to be as permissive. > > diff --git a/gcc/testsuite/gcc.target/arm/pr99977.c > b/gcc/testsuite/gcc.target/arm/pr99977.c > > new file mode 100644 > > index 00000000000..7911899d928 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/arm/pr99977.c > > @@ -0,0 +1,6 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-march=armv8-m.base -mfloat-abi=soft -O2" } */ > > +_Bool f1(int *p) { return __sync_bool_compare_and_swap (p, -1, 2); } > > +_Bool f2(int *p) { return __sync_bool_compare_and_swap (p, -8, 2); } > > +int g1(int *p) { return __sync_val_compare_and_swap (p, -1, 2); } > > +int g2(int *p) { return __sync_val_compare_and_swap (p, -8, 3); }
Hi Kyrill, On 27/04/2021 13:47, Kyrylo Tkachov wrote: > Hi Alex, > > > -----Original Message----- > > From: Alex Coplan <Alex.Coplan@arm.com> > > Sent: 27 April 2021 14:14 > > To: gcc-patches@gcc.gnu.org > > Cc: nickc@redhat.com; Richard Earnshaw <Richard.Earnshaw@arm.com>; > > Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com>; Kyrylo > > Tkachov <Kyrylo.Tkachov@arm.com> > > Subject: Re: [PATCH] arm: Fix ICEs with compare-and-swap and - > > march=armv8-m.base [PR99977] > > > > Ping > > > > On 15/04/2021 15:39, Alex Coplan via Gcc-patches wrote: > > > Hi all, > > > > > > The PR shows two ICEs with __sync_bool_compare_and_swap and > > > -mcpu=cortex-m23 (equivalently, -march=armv8-m.base): one in LRA and > > one > > > later on, after the CAS insn is split. > > > > > > The LRA ICE occurs because the > > > @atomic_compare_and_swap<CCSI:arch><SIDI:mode>_1 pattern > > attempts to tie > > > two output operands together (operands 0 and 1 in the third > > > alternative). LRA can't handle this, since it doesn't make sense for an > > > insn to assign to the same operand twice. > > > > > > The later (post-splitting) ICE occurs because the expansion of the > > > cbranchsi4_scratch insn doesn't quite go according to plan. As it > > > stands, arm_split_compare_and_swap calls gen_cbranchsi4_scratch, > > > attempting to pass a register (neg_bval) to use as a scratch register. > > > However, since the RTL template has a match_scratch here, > > > gen_cbranchsi4_scratch ignores this argument and produces a scratch rtx. > > > Since this is all happening after RA, this is doomed to fail (and we get > > > an ICE about the insn not matching its constraints). > > > > > > It seems that the motivation for the choice of constraints in the > > > atomic_compare_and_swap pattern comes from an attempt to satisfy the > > > constraints of the cbranchsi4_scratch insn. This insn requires the > > > scratch register to be the same as the input register in the case that > > > we use a larger negative immediate (one that satisfies J, but not L). > > > > > > Of course, as noted above, LRA refuses to assign two output operands to > > > the same register, so this was never going to work. > > > > > > The solution I'm proposing here is to collapse the alternatives to the > > > CAS insn (allowing the two output register operands to be matched to > > > different registers) and to ensure that the constraints for > > > cbranchsi4_scratch are met in arm_split_compare_and_swap. We do this > > by > > > inserting a move to ensure the source and destination registers match if > > > necessary (i.e. in the case of large negative immediates). > > > > > > Another notable change here is that we only do: > > > > > > emit_move_insn (neg_bval, const1_rtx); > > > > > > for non-negative immediates. This is because the ADDS instruction used in > > > the negative case suffices to leave a suitable value in neg_bval: if the > > > operands compare equal, we don't take the branch (so neg_bval will be > > > set by the load exclusive). Otherwise, the ADDS will leave a nonzero > > > value in neg_bval, which will correctly signal that the CAS has failed > > > when it is later negated. > > > > > > Testing: > > > * Bootstrapped and regtested on arm-linux-gnueabihf, no regressions. > > > * Regtested an arm-eabi cross configured with --with-arch=armv8-m.base, > > no > > > regressions. The patch fixes the gcc.dg/ia64-sync-3.c test in this config. > > > > > > OK for trunk? > > Ok. The patch applies cleanly on the 11 branch and passes bootstrap/regtest on arm-linux-gnueabihf as well as a regtest on arm-eabi configured with --with-arch=armv8-m.base. OK for the 11 branch? OK for the other affected branches if the same is true there? Thanks, Alex > Thanks, > Kyrill > > > > > > > Thanks, > > > Alex > > > > > > gcc/ChangeLog: > > > > > > PR target/99977 > > > * config/arm/arm.c (arm_split_compare_and_swap): Fix up codegen > > > with negative immediates: ensure we expand cbranchsi4_scratch > > > correctly and ensure we satisfy its constraints. > > > * config/arm/sync.md > > > (@atomic_compare_and_swap<CCSI:arch><NARROW:mode>_1): > > Don't > > > attempt to tie two output operands together with constraints; > > > collapse two alternatives. > > > (@atomic_compare_and_swap<CCSI:arch><SIDI:mode>_1): Likewise. > > > * config/arm/thumb1.md (cbranchsi4_neg_late): New. > > > > > > gcc/testsuite/ChangeLog: > > > > > > PR target/99977 > > > * gcc.target/arm/pr99977.c: New test. > > > > > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c > > > index 475fb0d827f..8d19b8a73fd 100644 > > > --- a/gcc/config/arm/arm.c > > > +++ b/gcc/config/arm/arm.c > > > @@ -30737,13 +30737,31 @@ arm_split_compare_and_swap (rtx > > operands[]) > > > } > > > else > > > { > > > - emit_move_insn (neg_bval, const1_rtx); > > > cond = gen_rtx_NE (VOIDmode, rval, oldval); > > > if (thumb1_cmpneg_operand (oldval, SImode)) > > > - emit_unlikely_jump (gen_cbranchsi4_scratch (neg_bval, rval, oldval, > > > - label2, cond)); > > > + { > > > + rtx src = rval; > > > + if (!satisfies_constraint_L (oldval)) > > > + { > > > + gcc_assert (satisfies_constraint_J (oldval)); > > > + > > > + /* For such immediates, ADDS needs the source and destination > > regs > > > + to be the same. > > > + > > > + Normally this would be handled by RA, but this is all > > happening > > > + after RA. */ > > > + emit_move_insn (neg_bval, rval); > > > + src = neg_bval; > > > + } > > > + > > > + emit_unlikely_jump (gen_cbranchsi4_neg_late (neg_bval, src, oldval, > > > + label2, cond)); > > > + } > > > else > > > - emit_unlikely_jump (gen_cbranchsi4_insn (cond, rval, oldval, label2)); > > > + { > > > + emit_move_insn (neg_bval, const1_rtx); > > > + emit_unlikely_jump (gen_cbranchsi4_insn (cond, rval, oldval, > > label2)); > > > + } > > > } > > > > > > arm_emit_store_exclusive (mode, neg_bval, mem, newval, use_release); > > > diff --git a/gcc/config/arm/sync.md b/gcc/config/arm/sync.md > > > index e4682c039b9..b9fa8702606 100644 > > > --- a/gcc/config/arm/sync.md > > > +++ b/gcc/config/arm/sync.md > > > @@ -187,20 +187,20 @@ > > > ;; Constraints of this pattern must be at least as strict as those of the > > > ;; cbranchsi operations in thumb1.md and aim to be as permissive. > > > (define_insn_and_split > > "@atomic_compare_and_swap<CCSI:arch><NARROW:mode>_1" > > > - [(set (match_operand:CCSI 0 "cc_register_operand" "=&c,&l,&l,&l") ;; > > bool out > > > + [(set (match_operand:CCSI 0 "cc_register_operand" "=&c,&l,&l") ;; > > bool out > > > (unspec_volatile:CCSI [(const_int 0)] VUNSPEC_ATOMIC_CAS)) > > > - (set (match_operand:SI 1 "s_register_operand" "=&r,&l,&0,&l*h") ;; val > > out > > > + (set (match_operand:SI 1 "s_register_operand" "=&r,&l,&l*h") ;; val > > out > > > (zero_extend:SI > > > - (match_operand:NARROW 2 "mem_noofs_operand" > > "+Ua,Ua,Ua,Ua"))) ;; memory > > > + (match_operand:NARROW 2 "mem_noofs_operand" "+Ua,Ua,Ua"))) > > ;; memory > > > (set (match_dup 2) > > > (unspec_volatile:NARROW > > > - [(match_operand:SI 3 "arm_add_operand" "rIL,lIL*h,J,*r") ;; > > expected > > > - (match_operand:NARROW 4 "s_register_operand" "r,r,r,r") ;; > > desired > > > + [(match_operand:SI 3 "arm_add_operand" "rIL,lILJ*h,*r") ;; > > expected > > > + (match_operand:NARROW 4 "s_register_operand" "r,r,r") ;; > > desired > > > (match_operand:SI 5 "const_int_operand") ;; is_weak > > > (match_operand:SI 6 "const_int_operand") ;; mod_s > > > (match_operand:SI 7 "const_int_operand")] ;; mod_f > > > VUNSPEC_ATOMIC_CAS)) > > > - (clobber (match_scratch:SI 8 "=&r,X,X,X"))] > > > + (clobber (match_scratch:SI 8 "=&r,X,X"))] > > > "<NARROW:sync_predtab>" > > > "#" > > > "&& reload_completed" > > > @@ -209,7 +209,7 @@ > > > arm_split_compare_and_swap (operands); > > > DONE; > > > } > > > - [(set_attr "arch" "32,v8mb,v8mb,v8mb")]) > > > + [(set_attr "arch" "32,v8mb,v8mb")]) > > > > > > (define_mode_attr cas_cmp_operand > > > [(SI "arm_add_operand") (DI "cmpdi_operand")]) > > > @@ -219,19 +219,19 @@ > > > ;; Constraints of this pattern must be at least as strict as those of the > > > ;; cbranchsi operations in thumb1.md and aim to be as permissive. > > > (define_insn_and_split > > "@atomic_compare_and_swap<CCSI:arch><SIDI:mode>_1" > > > - [(set (match_operand:CCSI 0 "cc_register_operand" "=&c,&l,&l,&l") ;; > > bool out > > > + [(set (match_operand:CCSI 0 "cc_register_operand" "=&c,&l,&l") ;; > > bool out > > > (unspec_volatile:CCSI [(const_int 0)] VUNSPEC_ATOMIC_CAS)) > > > - (set (match_operand:SIDI 1 "s_register_operand" "=&r,&l,&0,&l*h") > > ;; val out > > > - (match_operand:SIDI 2 "mem_noofs_operand" "+Ua,Ua,Ua,Ua")) > > ;; memory > > > + (set (match_operand:SIDI 1 "s_register_operand" "=&r,&l,&l*h") ;; val > > out > > > + (match_operand:SIDI 2 "mem_noofs_operand" "+Ua,Ua,Ua")) ;; > > memory > > > (set (match_dup 2) > > > (unspec_volatile:SIDI > > > - [(match_operand:SIDI 3 "<SIDI:cas_cmp_operand>" > > "<SIDI:cas_cmp_str>,lIL*h,J,*r") ;; expect > > > - (match_operand:SIDI 4 "s_register_operand" "r,r,r,r") ;; desired > > > + [(match_operand:SIDI 3 "<SIDI:cas_cmp_operand>" > > "<SIDI:cas_cmp_str>,lILJ*h,*r") ;; expect > > > + (match_operand:SIDI 4 "s_register_operand" "r,r,r") ;; desired > > > (match_operand:SI 5 "const_int_operand") ;; is_weak > > > (match_operand:SI 6 "const_int_operand") ;; mod_s > > > (match_operand:SI 7 "const_int_operand")] ;; mod_f > > > VUNSPEC_ATOMIC_CAS)) > > > - (clobber (match_scratch:SI 8 "=&r,X,X,X"))] > > > + (clobber (match_scratch:SI 8 "=&r,X,X"))] > > > "<SIDI:sync_predtab>" > > > "#" > > > "&& reload_completed" > > > @@ -240,7 +240,7 @@ > > > arm_split_compare_and_swap (operands); > > > DONE; > > > } > > > - [(set_attr "arch" "32,v8mb,v8mb,v8mb")]) > > > + [(set_attr "arch" "32,v8mb,v8mb")]) > > > > > > (define_insn_and_split "atomic_exchange<mode>" > > > [(set (match_operand:QHSD 0 "s_register_operand" "=&r,&r") ;; > > output > > > diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md > > > index c98b59c4fa7..084ed6597e0 100644 > > > --- a/gcc/config/arm/thumb1.md > > > +++ b/gcc/config/arm/thumb1.md > > > @@ -1206,6 +1206,21 @@ > > > (set_attr "type" "multiple")] > > > ) > > > > > > +;; An expander which makes use of the cbranchsi4_scratch insn, but can > > > +;; be used safely after RA. > > > +(define_expand "cbranchsi4_neg_late" > > > + [(parallel [ > > > + (set (pc) (if_then_else > > > + (match_operator 4 "arm_comparison_operator" > > > + [(match_operand:SI 1 "s_register_operand") > > > + (match_operand:SI 2 "thumb1_cmpneg_operand")]) > > > + (label_ref (match_operand 3 "" "")) > > > + (pc))) > > > + (clobber (match_operand:SI 0 "s_register_operand")) > > > + ])] > > > + "TARGET_THUMB1" > > > +) > > > + > > > ;; Changes to the constraints of this pattern must be propagated to those > > of > > > ;; atomic compare_and_swap splitters in sync.md. These must be at least > > as > > > ;; strict as the constraints here and aim to be as permissive. > > > diff --git a/gcc/testsuite/gcc.target/arm/pr99977.c > > b/gcc/testsuite/gcc.target/arm/pr99977.c > > > new file mode 100644 > > > index 00000000000..7911899d928 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.target/arm/pr99977.c > > > @@ -0,0 +1,6 @@ > > > +/* { dg-do compile } */ > > > +/* { dg-options "-march=armv8-m.base -mfloat-abi=soft -O2" } */ > > > +_Bool f1(int *p) { return __sync_bool_compare_and_swap (p, -1, 2); } > > > +_Bool f2(int *p) { return __sync_bool_compare_and_swap (p, -8, 2); } > > > +int g1(int *p) { return __sync_val_compare_and_swap (p, -1, 2); } > > > +int g2(int *p) { return __sync_val_compare_and_swap (p, -8, 3); }
> -----Original Message----- > From: Alex Coplan <Alex.Coplan@arm.com> > Sent: 17 May 2021 17:29 > To: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com> > Cc: gcc-patches@gcc.gnu.org; nickc@redhat.com; Richard Earnshaw > <Richard.Earnshaw@arm.com>; Ramana Radhakrishnan > <Ramana.Radhakrishnan@arm.com> > Subject: Re: [PATCH] arm: Fix ICEs with compare-and-swap and - > march=armv8-m.base [PR99977] > > Hi Kyrill, > > On 27/04/2021 13:47, Kyrylo Tkachov wrote: > > Hi Alex, > > > > > -----Original Message----- > > > From: Alex Coplan <Alex.Coplan@arm.com> > > > Sent: 27 April 2021 14:14 > > > To: gcc-patches@gcc.gnu.org > > > Cc: nickc@redhat.com; Richard Earnshaw > <Richard.Earnshaw@arm.com>; > > > Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com>; Kyrylo > > > Tkachov <Kyrylo.Tkachov@arm.com> > > > Subject: Re: [PATCH] arm: Fix ICEs with compare-and-swap and - > > > march=armv8-m.base [PR99977] > > > > > > Ping > > > > > > On 15/04/2021 15:39, Alex Coplan via Gcc-patches wrote: > > > > Hi all, > > > > > > > > The PR shows two ICEs with __sync_bool_compare_and_swap and > > > > -mcpu=cortex-m23 (equivalently, -march=armv8-m.base): one in LRA > and > > > one > > > > later on, after the CAS insn is split. > > > > > > > > The LRA ICE occurs because the > > > > @atomic_compare_and_swap<CCSI:arch><SIDI:mode>_1 pattern > > > attempts to tie > > > > two output operands together (operands 0 and 1 in the third > > > > alternative). LRA can't handle this, since it doesn't make sense for an > > > > insn to assign to the same operand twice. > > > > > > > > The later (post-splitting) ICE occurs because the expansion of the > > > > cbranchsi4_scratch insn doesn't quite go according to plan. As it > > > > stands, arm_split_compare_and_swap calls gen_cbranchsi4_scratch, > > > > attempting to pass a register (neg_bval) to use as a scratch register. > > > > However, since the RTL template has a match_scratch here, > > > > gen_cbranchsi4_scratch ignores this argument and produces a scratch > rtx. > > > > Since this is all happening after RA, this is doomed to fail (and we get > > > > an ICE about the insn not matching its constraints). > > > > > > > > It seems that the motivation for the choice of constraints in the > > > > atomic_compare_and_swap pattern comes from an attempt to satisfy > the > > > > constraints of the cbranchsi4_scratch insn. This insn requires the > > > > scratch register to be the same as the input register in the case that > > > > we use a larger negative immediate (one that satisfies J, but not L). > > > > > > > > Of course, as noted above, LRA refuses to assign two output operands > to > > > > the same register, so this was never going to work. > > > > > > > > The solution I'm proposing here is to collapse the alternatives to the > > > > CAS insn (allowing the two output register operands to be matched to > > > > different registers) and to ensure that the constraints for > > > > cbranchsi4_scratch are met in arm_split_compare_and_swap. We do > this > > > by > > > > inserting a move to ensure the source and destination registers match if > > > > necessary (i.e. in the case of large negative immediates). > > > > > > > > Another notable change here is that we only do: > > > > > > > > emit_move_insn (neg_bval, const1_rtx); > > > > > > > > for non-negative immediates. This is because the ADDS instruction used > in > > > > the negative case suffices to leave a suitable value in neg_bval: if the > > > > operands compare equal, we don't take the branch (so neg_bval will be > > > > set by the load exclusive). Otherwise, the ADDS will leave a nonzero > > > > value in neg_bval, which will correctly signal that the CAS has failed > > > > when it is later negated. > > > > > > > > Testing: > > > > * Bootstrapped and regtested on arm-linux-gnueabihf, no regressions. > > > > * Regtested an arm-eabi cross configured with --with-arch=armv8- > m.base, > > > no > > > > regressions. The patch fixes the gcc.dg/ia64-sync-3.c test in this config. > > > > > > > > OK for trunk? > > > > Ok. > > The patch applies cleanly on the 11 branch and passes bootstrap/regtest on > arm-linux-gnueabihf as well as a regtest on arm-eabi configured with > --with-arch=armv8-m.base. > > OK for the 11 branch? OK for the other affected branches if the same is > true there? Yes, Thanks, Kyrill > > Thanks, > Alex > > > Thanks, > > Kyrill > > > > > > > > > > Thanks, > > > > Alex > > > > > > > > gcc/ChangeLog: > > > > > > > > PR target/99977 > > > > * config/arm/arm.c (arm_split_compare_and_swap): Fix up codegen > > > > with negative immediates: ensure we expand cbranchsi4_scratch > > > > correctly and ensure we satisfy its constraints. > > > > * config/arm/sync.md > > > > (@atomic_compare_and_swap<CCSI:arch><NARROW:mode>_1): > > > Don't > > > > attempt to tie two output operands together with constraints; > > > > collapse two alternatives. > > > > (@atomic_compare_and_swap<CCSI:arch><SIDI:mode>_1): Likewise. > > > > * config/arm/thumb1.md (cbranchsi4_neg_late): New. > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > PR target/99977 > > > > * gcc.target/arm/pr99977.c: New test. > > > > > > > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c > > > > index 475fb0d827f..8d19b8a73fd 100644 > > > > --- a/gcc/config/arm/arm.c > > > > +++ b/gcc/config/arm/arm.c > > > > @@ -30737,13 +30737,31 @@ arm_split_compare_and_swap (rtx > > > operands[]) > > > > } > > > > else > > > > { > > > > - emit_move_insn (neg_bval, const1_rtx); > > > > cond = gen_rtx_NE (VOIDmode, rval, oldval); > > > > if (thumb1_cmpneg_operand (oldval, SImode)) > > > > - emit_unlikely_jump (gen_cbranchsi4_scratch (neg_bval, rval, oldval, > > > > - label2, cond)); > > > > + { > > > > + rtx src = rval; > > > > + if (!satisfies_constraint_L (oldval)) > > > > + { > > > > + gcc_assert (satisfies_constraint_J (oldval)); > > > > + > > > > + /* For such immediates, ADDS needs the source and destination > > > regs > > > > + to be the same. > > > > + > > > > + Normally this would be handled by RA, but this is all > > > happening > > > > + after RA. */ > > > > + emit_move_insn (neg_bval, rval); > > > > + src = neg_bval; > > > > + } > > > > + > > > > + emit_unlikely_jump (gen_cbranchsi4_neg_late (neg_bval, src, oldval, > > > > + label2, cond)); > > > > + } > > > > else > > > > - emit_unlikely_jump (gen_cbranchsi4_insn (cond, rval, oldval, label2)); > > > > + { > > > > + emit_move_insn (neg_bval, const1_rtx); > > > > + emit_unlikely_jump (gen_cbranchsi4_insn (cond, rval, oldval, > > > label2)); > > > > + } > > > > } > > > > > > > > arm_emit_store_exclusive (mode, neg_bval, mem, newval, > use_release); > > > > diff --git a/gcc/config/arm/sync.md b/gcc/config/arm/sync.md > > > > index e4682c039b9..b9fa8702606 100644 > > > > --- a/gcc/config/arm/sync.md > > > > +++ b/gcc/config/arm/sync.md > > > > @@ -187,20 +187,20 @@ > > > > ;; Constraints of this pattern must be at least as strict as those of the > > > > ;; cbranchsi operations in thumb1.md and aim to be as permissive. > > > > (define_insn_and_split > > > "@atomic_compare_and_swap<CCSI:arch><NARROW:mode>_1" > > > > - [(set (match_operand:CCSI 0 "cc_register_operand" "=&c,&l,&l,&l") > ;; > > > bool out > > > > + [(set (match_operand:CCSI 0 "cc_register_operand" "=&c,&l,&l") > ;; > > > bool out > > > > (unspec_volatile:CCSI [(const_int 0)] VUNSPEC_ATOMIC_CAS)) > > > > - (set (match_operand:SI 1 "s_register_operand" "=&r,&l,&0,&l*h") > ;; val > > > out > > > > + (set (match_operand:SI 1 "s_register_operand" "=&r,&l,&l*h") ;; val > > > out > > > > (zero_extend:SI > > > > - (match_operand:NARROW 2 "mem_noofs_operand" > > > "+Ua,Ua,Ua,Ua"))) ;; memory > > > > + (match_operand:NARROW 2 "mem_noofs_operand" "+Ua,Ua,Ua"))) > > > ;; memory > > > > (set (match_dup 2) > > > > (unspec_volatile:NARROW > > > > - [(match_operand:SI 3 "arm_add_operand" "rIL,lIL*h,J,*r") ;; > > > expected > > > > - (match_operand:NARROW 4 "s_register_operand" "r,r,r,r") ;; > > > desired > > > > + [(match_operand:SI 3 "arm_add_operand" "rIL,lILJ*h,*r") ;; > > > expected > > > > + (match_operand:NARROW 4 "s_register_operand" "r,r,r") ;; > > > desired > > > > (match_operand:SI 5 "const_int_operand") ;; is_weak > > > > (match_operand:SI 6 "const_int_operand") ;; mod_s > > > > (match_operand:SI 7 "const_int_operand")] ;; mod_f > > > > VUNSPEC_ATOMIC_CAS)) > > > > - (clobber (match_scratch:SI 8 "=&r,X,X,X"))] > > > > + (clobber (match_scratch:SI 8 "=&r,X,X"))] > > > > "<NARROW:sync_predtab>" > > > > "#" > > > > "&& reload_completed" > > > > @@ -209,7 +209,7 @@ > > > > arm_split_compare_and_swap (operands); > > > > DONE; > > > > } > > > > - [(set_attr "arch" "32,v8mb,v8mb,v8mb")]) > > > > + [(set_attr "arch" "32,v8mb,v8mb")]) > > > > > > > > (define_mode_attr cas_cmp_operand > > > > [(SI "arm_add_operand") (DI "cmpdi_operand")]) > > > > @@ -219,19 +219,19 @@ > > > > ;; Constraints of this pattern must be at least as strict as those of the > > > > ;; cbranchsi operations in thumb1.md and aim to be as permissive. > > > > (define_insn_and_split > > > "@atomic_compare_and_swap<CCSI:arch><SIDI:mode>_1" > > > > - [(set (match_operand:CCSI 0 "cc_register_operand" "=&c,&l,&l,&l") > ;; > > > bool out > > > > + [(set (match_operand:CCSI 0 "cc_register_operand" "=&c,&l,&l") > ;; > > > bool out > > > > (unspec_volatile:CCSI [(const_int 0)] VUNSPEC_ATOMIC_CAS)) > > > > - (set (match_operand:SIDI 1 "s_register_operand" "=&r,&l,&0,&l*h") > > > ;; val out > > > > - (match_operand:SIDI 2 "mem_noofs_operand" "+Ua,Ua,Ua,Ua")) > > > ;; memory > > > > + (set (match_operand:SIDI 1 "s_register_operand" "=&r,&l,&l*h") > ;; val > > > out > > > > + (match_operand:SIDI 2 "mem_noofs_operand" "+Ua,Ua,Ua")) ;; > > > memory > > > > (set (match_dup 2) > > > > (unspec_volatile:SIDI > > > > - [(match_operand:SIDI 3 "<SIDI:cas_cmp_operand>" > > > "<SIDI:cas_cmp_str>,lIL*h,J,*r") ;; expect > > > > - (match_operand:SIDI 4 "s_register_operand" "r,r,r,r") ;; desired > > > > + [(match_operand:SIDI 3 "<SIDI:cas_cmp_operand>" > > > "<SIDI:cas_cmp_str>,lILJ*h,*r") ;; expect > > > > + (match_operand:SIDI 4 "s_register_operand" "r,r,r") ;; desired > > > > (match_operand:SI 5 "const_int_operand") ;; is_weak > > > > (match_operand:SI 6 "const_int_operand") ;; mod_s > > > > (match_operand:SI 7 "const_int_operand")] ;; mod_f > > > > VUNSPEC_ATOMIC_CAS)) > > > > - (clobber (match_scratch:SI 8 "=&r,X,X,X"))] > > > > + (clobber (match_scratch:SI 8 "=&r,X,X"))] > > > > "<SIDI:sync_predtab>" > > > > "#" > > > > "&& reload_completed" > > > > @@ -240,7 +240,7 @@ > > > > arm_split_compare_and_swap (operands); > > > > DONE; > > > > } > > > > - [(set_attr "arch" "32,v8mb,v8mb,v8mb")]) > > > > + [(set_attr "arch" "32,v8mb,v8mb")]) > > > > > > > > (define_insn_and_split "atomic_exchange<mode>" > > > > [(set (match_operand:QHSD 0 "s_register_operand" "=&r,&r") ;; > > > output > > > > diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md > > > > index c98b59c4fa7..084ed6597e0 100644 > > > > --- a/gcc/config/arm/thumb1.md > > > > +++ b/gcc/config/arm/thumb1.md > > > > @@ -1206,6 +1206,21 @@ > > > > (set_attr "type" "multiple")] > > > > ) > > > > > > > > +;; An expander which makes use of the cbranchsi4_scratch insn, but > can > > > > +;; be used safely after RA. > > > > +(define_expand "cbranchsi4_neg_late" > > > > + [(parallel [ > > > > + (set (pc) (if_then_else > > > > + (match_operator 4 "arm_comparison_operator" > > > > + [(match_operand:SI 1 "s_register_operand") > > > > + (match_operand:SI 2 "thumb1_cmpneg_operand")]) > > > > + (label_ref (match_operand 3 "" "")) > > > > + (pc))) > > > > + (clobber (match_operand:SI 0 "s_register_operand")) > > > > + ])] > > > > + "TARGET_THUMB1" > > > > +) > > > > + > > > > ;; Changes to the constraints of this pattern must be propagated to > those > > > of > > > > ;; atomic compare_and_swap splitters in sync.md. These must be at > least > > > as > > > > ;; strict as the constraints here and aim to be as permissive. > > > > diff --git a/gcc/testsuite/gcc.target/arm/pr99977.c > > > b/gcc/testsuite/gcc.target/arm/pr99977.c > > > > new file mode 100644 > > > > index 00000000000..7911899d928 > > > > --- /dev/null > > > > +++ b/gcc/testsuite/gcc.target/arm/pr99977.c > > > > @@ -0,0 +1,6 @@ > > > > +/* { dg-do compile } */ > > > > +/* { dg-options "-march=armv8-m.base -mfloat-abi=soft -O2" } */ > > > > +_Bool f1(int *p) { return __sync_bool_compare_and_swap (p, -1, 2); } > > > > +_Bool f2(int *p) { return __sync_bool_compare_and_swap (p, -8, 2); } > > > > +int g1(int *p) { return __sync_val_compare_and_swap (p, -1, 2); } > > > > +int g2(int *p) { return __sync_val_compare_and_swap (p, -8, 3); }
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 475fb0d827f..8d19b8a73fd 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -30737,13 +30737,31 @@ arm_split_compare_and_swap (rtx operands[]) } else { - emit_move_insn (neg_bval, const1_rtx); cond = gen_rtx_NE (VOIDmode, rval, oldval); if (thumb1_cmpneg_operand (oldval, SImode)) - emit_unlikely_jump (gen_cbranchsi4_scratch (neg_bval, rval, oldval, - label2, cond)); + { + rtx src = rval; + if (!satisfies_constraint_L (oldval)) + { + gcc_assert (satisfies_constraint_J (oldval)); + + /* For such immediates, ADDS needs the source and destination regs + to be the same. + + Normally this would be handled by RA, but this is all happening + after RA. */ + emit_move_insn (neg_bval, rval); + src = neg_bval; + } + + emit_unlikely_jump (gen_cbranchsi4_neg_late (neg_bval, src, oldval, + label2, cond)); + } else - emit_unlikely_jump (gen_cbranchsi4_insn (cond, rval, oldval, label2)); + { + emit_move_insn (neg_bval, const1_rtx); + emit_unlikely_jump (gen_cbranchsi4_insn (cond, rval, oldval, label2)); + } } arm_emit_store_exclusive (mode, neg_bval, mem, newval, use_release); diff --git a/gcc/config/arm/sync.md b/gcc/config/arm/sync.md index e4682c039b9..b9fa8702606 100644 --- a/gcc/config/arm/sync.md +++ b/gcc/config/arm/sync.md @@ -187,20 +187,20 @@ ;; Constraints of this pattern must be at least as strict as those of the ;; cbranchsi operations in thumb1.md and aim to be as permissive. (define_insn_and_split "@atomic_compare_and_swap<CCSI:arch><NARROW:mode>_1" - [(set (match_operand:CCSI 0 "cc_register_operand" "=&c,&l,&l,&l") ;; bool out + [(set (match_operand:CCSI 0 "cc_register_operand" "=&c,&l,&l") ;; bool out (unspec_volatile:CCSI [(const_int 0)] VUNSPEC_ATOMIC_CAS)) - (set (match_operand:SI 1 "s_register_operand" "=&r,&l,&0,&l*h") ;; val out + (set (match_operand:SI 1 "s_register_operand" "=&r,&l,&l*h") ;; val out (zero_extend:SI - (match_operand:NARROW 2 "mem_noofs_operand" "+Ua,Ua,Ua,Ua"))) ;; memory + (match_operand:NARROW 2 "mem_noofs_operand" "+Ua,Ua,Ua"))) ;; memory (set (match_dup 2) (unspec_volatile:NARROW - [(match_operand:SI 3 "arm_add_operand" "rIL,lIL*h,J,*r") ;; expected - (match_operand:NARROW 4 "s_register_operand" "r,r,r,r") ;; desired + [(match_operand:SI 3 "arm_add_operand" "rIL,lILJ*h,*r") ;; expected + (match_operand:NARROW 4 "s_register_operand" "r,r,r") ;; desired (match_operand:SI 5 "const_int_operand") ;; is_weak (match_operand:SI 6 "const_int_operand") ;; mod_s (match_operand:SI 7 "const_int_operand")] ;; mod_f VUNSPEC_ATOMIC_CAS)) - (clobber (match_scratch:SI 8 "=&r,X,X,X"))] + (clobber (match_scratch:SI 8 "=&r,X,X"))] "<NARROW:sync_predtab>" "#" "&& reload_completed" @@ -209,7 +209,7 @@ arm_split_compare_and_swap (operands); DONE; } - [(set_attr "arch" "32,v8mb,v8mb,v8mb")]) + [(set_attr "arch" "32,v8mb,v8mb")]) (define_mode_attr cas_cmp_operand [(SI "arm_add_operand") (DI "cmpdi_operand")]) @@ -219,19 +219,19 @@ ;; Constraints of this pattern must be at least as strict as those of the ;; cbranchsi operations in thumb1.md and aim to be as permissive. (define_insn_and_split "@atomic_compare_and_swap<CCSI:arch><SIDI:mode>_1" - [(set (match_operand:CCSI 0 "cc_register_operand" "=&c,&l,&l,&l") ;; bool out + [(set (match_operand:CCSI 0 "cc_register_operand" "=&c,&l,&l") ;; bool out (unspec_volatile:CCSI [(const_int 0)] VUNSPEC_ATOMIC_CAS)) - (set (match_operand:SIDI 1 "s_register_operand" "=&r,&l,&0,&l*h") ;; val out - (match_operand:SIDI 2 "mem_noofs_operand" "+Ua,Ua,Ua,Ua")) ;; memory + (set (match_operand:SIDI 1 "s_register_operand" "=&r,&l,&l*h") ;; val out + (match_operand:SIDI 2 "mem_noofs_operand" "+Ua,Ua,Ua")) ;; memory (set (match_dup 2) (unspec_volatile:SIDI - [(match_operand:SIDI 3 "<SIDI:cas_cmp_operand>" "<SIDI:cas_cmp_str>,lIL*h,J,*r") ;; expect - (match_operand:SIDI 4 "s_register_operand" "r,r,r,r") ;; desired + [(match_operand:SIDI 3 "<SIDI:cas_cmp_operand>" "<SIDI:cas_cmp_str>,lILJ*h,*r") ;; expect + (match_operand:SIDI 4 "s_register_operand" "r,r,r") ;; desired (match_operand:SI 5 "const_int_operand") ;; is_weak (match_operand:SI 6 "const_int_operand") ;; mod_s (match_operand:SI 7 "const_int_operand")] ;; mod_f VUNSPEC_ATOMIC_CAS)) - (clobber (match_scratch:SI 8 "=&r,X,X,X"))] + (clobber (match_scratch:SI 8 "=&r,X,X"))] "<SIDI:sync_predtab>" "#" "&& reload_completed" @@ -240,7 +240,7 @@ arm_split_compare_and_swap (operands); DONE; } - [(set_attr "arch" "32,v8mb,v8mb,v8mb")]) + [(set_attr "arch" "32,v8mb,v8mb")]) (define_insn_and_split "atomic_exchange<mode>" [(set (match_operand:QHSD 0 "s_register_operand" "=&r,&r") ;; output diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md index c98b59c4fa7..084ed6597e0 100644 --- a/gcc/config/arm/thumb1.md +++ b/gcc/config/arm/thumb1.md @@ -1206,6 +1206,21 @@ (set_attr "type" "multiple")] ) +;; An expander which makes use of the cbranchsi4_scratch insn, but can +;; be used safely after RA. +(define_expand "cbranchsi4_neg_late" + [(parallel [ + (set (pc) (if_then_else + (match_operator 4 "arm_comparison_operator" + [(match_operand:SI 1 "s_register_operand") + (match_operand:SI 2 "thumb1_cmpneg_operand")]) + (label_ref (match_operand 3 "" "")) + (pc))) + (clobber (match_operand:SI 0 "s_register_operand")) + ])] + "TARGET_THUMB1" +) + ;; Changes to the constraints of this pattern must be propagated to those of ;; atomic compare_and_swap splitters in sync.md. These must be at least as ;; strict as the constraints here and aim to be as permissive. diff --git a/gcc/testsuite/gcc.target/arm/pr99977.c b/gcc/testsuite/gcc.target/arm/pr99977.c new file mode 100644 index 00000000000..7911899d928 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/pr99977.c @@ -0,0 +1,6 @@ +/* { dg-do compile } */ +/* { dg-options "-march=armv8-m.base -mfloat-abi=soft -O2" } */ +_Bool f1(int *p) { return __sync_bool_compare_and_swap (p, -1, 2); } +_Bool f2(int *p) { return __sync_bool_compare_and_swap (p, -8, 2); } +int g1(int *p) { return __sync_val_compare_and_swap (p, -1, 2); } +int g2(int *p) { return __sync_val_compare_and_swap (p, -8, 3); }