Message ID | 20240307011252.11808-1-chenglulu@loongson.cn |
---|---|
State | New |
Headers | show |
Series | [v1] LoongArch: Fixed an issue with the implementation of the template atomic_compare_and_swapsi. | expand |
On Thu, 2024-03-07 at 09:12 +0800, Lulu Cheng wrote: > + output_asm_insn ("1:", operands); > + output_asm_insn ("ll.<amo>\t%0,%1", operands); > + > + /* Like the test case atomic-cas-int.C, in loongarch64, O1 and higher, the > + return value of the val_without_const_folding will not be truncated and > + will be passed directly to the function compare_exchange_strong. > + However, the instruction 'bne' does not distinguish between 32-bit and > + 64-bit operations. so if the upper 32 bits of the register are not > + extended by the 32nd bit symbol, then the comparison may not be valid > + here. This will affect the result of the operation. */ > + > + if (TARGET_64BIT && REG_P (operands[2]) > + && GET_MODE (operands[2]) == SImode) > + { > + output_asm_insn ("addi.w\t%5,%2,0", operands); > + output_asm_insn ("bne\t%0,%5,2f", operands); It should be better to extend the expected value before the ll/sc loop (like what LLVM does), instead of repeating the extending in each iteration. Something like: diff --git a/gcc/config/loongarch/sync.md b/gcc/config/loongarch/sync.md index 8f35a5b48d2..c21781947fd 100644 --- a/gcc/config/loongarch/sync.md +++ b/gcc/config/loongarch/sync.md @@ -234,11 +234,11 @@ (define_insn "atomic_exchange<mode>_short" "amswap%A3.<amo>\t%0,%z2,%1" [(set (attr "length") (const_int 4))]) -(define_insn "atomic_cas_value_strong<mode>" +(define_insn "atomic_cas_value_strong<GPR:mode><X:mode>" [(set (match_operand:GPR 0 "register_operand" "=&r") (match_operand:GPR 1 "memory_operand" "+ZC")) (set (match_dup 1) - (unspec_volatile:GPR [(match_operand:GPR 2 "reg_or_0_operand" "rJ") + (unspec_volatile:GPR [(match_operand:X 2 "reg_or_0_operand" "rJ") (match_operand:GPR 3 "reg_or_0_operand" "rJ") (match_operand:SI 4 "const_int_operand")] ;; mod_s UNSPEC_COMPARE_AND_SWAP)) @@ -246,10 +246,10 @@ (define_insn "atomic_cas_value_strong<mode>" "" { return "1:\\n\\t" - "ll.<amo>\\t%0,%1\\n\\t" + "ll.<GPR:amo>\\t%0,%1\\n\\t" "bne\\t%0,%z2,2f\\n\\t" "or%i3\\t%5,$zero,%3\\n\\t" - "sc.<amo>\\t%5,%1\\n\\t" + "sc.<GPR:amo>\\t%5,%1\\n\\t" "beqz\\t%5,1b\\n\\t" "b\\t3f\\n\\t" "2:\\n\\t" @@ -301,9 +301,23 @@ (define_expand "atomic_compare_and_swap<mode>" operands[3], operands[4], operands[6])); else - emit_insn (gen_atomic_cas_value_strong<mode> (operands[1], operands[2], - operands[3], operands[4], - operands[6])); + { + rtx (*cas)(rtx, rtx, rtx, rtx, rtx) = + TARGET_64BIT ? gen_atomic_cas_value_strong<mode>di + : gen_atomic_cas_value_strong<mode>si; + rtx expect = operands[3]; + + if (<MODE>mode == SImode + && TARGET_64BIT + && operands[3] != const0_rtx) + { + expect = gen_reg_rtx (DImode); + emit_insn (gen_extendsidi2 (expect, operands[3])); + } + + emit_insn (cas (operands[1], operands[2], expect, operands[4], + operands[6])); + } rtx compare = operands[1]; if (operands[3] != const0_rtx) It produces: slli.w $r4,$r4,0 1: ll.w $r14,$r3,0 bne $r14,$r4,2f or $r15,$zero,$r12 sc.w $r15,$r3,0 beqz $r15,1b b 3f 2: dbar 0b10100 3: for the test case and the compiled test case runs successfully. I've not done a full bootstrap yet though.
在 2024/3/7 下午8:52, Xi Ruoyao 写道: > It should be better to extend the expected value before the ll/sc loop > (like what LLVM does), instead of repeating the extending in each > iteration. Something like: I wanted to do this at first, but it didn't work out. But then I thought about it, and there are two benefits to putting it in the middle of ll/sc: 1. If there is an operation that uses the $r4 register after this atomic operation, another register is required to store $r4. 2. ll.w requires long cycles, so putting an addi.w command after ll.w won't make a difference. So based on the above, I didn't try again, but directly made a modification like a patch.
On Thu, 2024-03-07 at 21:07 +0800, chenglulu wrote: > > 在 2024/3/7 下午8:52, Xi Ruoyao 写道: > > It should be better to extend the expected value before the ll/sc loop > > (like what LLVM does), instead of repeating the extending in each > > iteration. Something like: > > I wanted to do this at first, but it didn't work out. > > But then I thought about it, and there are two benefits to putting it in > the middle of ll/sc: > > 1. If there is an operation that uses the $r4 register after this atomic > operation, another > > register is required to store $r4. > > 2. ll.w requires long cycles, so putting an addi.w command after ll.w > won't make a difference. > > So based on the above, I didn't try again, but directly made a > modification like a patch. Ah, the explanation makes sense to me. Ok with the original patch then.
在 2024/3/8 下午2:22, Xi Ruoyao 写道: > On Thu, 2024-03-07 at 21:07 +0800, chenglulu wrote: >> 在 2024/3/7 下午8:52, Xi Ruoyao 写道: >>> It should be better to extend the expected value before the ll/sc loop >>> (like what LLVM does), instead of repeating the extending in each >>> iteration. Something like: >> I wanted to do this at first, but it didn't work out. >> >> But then I thought about it, and there are two benefits to putting it in >> the middle of ll/sc: >> >> 1. If there is an operation that uses the $r4 register after this atomic >> operation, another >> >> register is required to store $r4. >> >> 2. ll.w requires long cycles, so putting an addi.w command after ll.w >> won't make a difference. >> >> So based on the above, I didn't try again, but directly made a >> modification like a patch. > Ah, the explanation makes sense to me. Ok with the original patch then. Ok.Thank you so much!:-)
Pushed to r14-9407. 在 2024/3/7 上午9:12, Lulu Cheng 写道: > If the hardware does not support LAMCAS, atomic_compare_and_swapsi needs to be > implemented through "ll.w+sc.w". In the implementation of the instruction sequence, > it is necessary to determine whether the two registers are equal. > Since LoongArch's comparison instructions do not distinguish between 32-bit > and 64-bit, the two operand registers that need to be compared are symbolically > extended, and one of the operand registers is obtained from memory through the > "ll.w" instruction, which can ensure that the symbolic expansion is carried out. > However, the value of the other operand register is not guaranteed to be the > value of the sign extension. > > gcc/ChangeLog: > > * config/loongarch/sync.md (atomic_cas_value_strong<mode>): > In loongarch64, a sign extension operation is added when > operands[2] is a register operand and the mode is SImode. > > gcc/testsuite/ChangeLog: > > * g++.target/loongarch/atomic-cas-int.C: New test. > --- > gcc/config/loongarch/sync.md | 46 ++++++++++++++----- > .../g++.target/loongarch/atomic-cas-int.C | 32 +++++++++++++ > 2 files changed, 67 insertions(+), 11 deletions(-) > create mode 100644 gcc/testsuite/g++.target/loongarch/atomic-cas-int.C > > diff --git a/gcc/config/loongarch/sync.md b/gcc/config/loongarch/sync.md > index 8f35a5b48d2..d41c2d26811 100644 > --- a/gcc/config/loongarch/sync.md > +++ b/gcc/config/loongarch/sync.md > @@ -245,18 +245,42 @@ (define_insn "atomic_cas_value_strong<mode>" > (clobber (match_scratch:GPR 5 "=&r"))] > "" > { > - return "1:\\n\\t" > - "ll.<amo>\\t%0,%1\\n\\t" > - "bne\\t%0,%z2,2f\\n\\t" > - "or%i3\\t%5,$zero,%3\\n\\t" > - "sc.<amo>\\t%5,%1\\n\\t" > - "beqz\\t%5,1b\\n\\t" > - "b\\t3f\\n\\t" > - "2:\\n\\t" > - "%G4\\n\\t" > - "3:\\n\\t"; > + output_asm_insn ("1:", operands); > + output_asm_insn ("ll.<amo>\t%0,%1", operands); > + > + /* Like the test case atomic-cas-int.C, in loongarch64, O1 and higher, the > + return value of the val_without_const_folding will not be truncated and > + will be passed directly to the function compare_exchange_strong. > + However, the instruction 'bne' does not distinguish between 32-bit and > + 64-bit operations. so if the upper 32 bits of the register are not > + extended by the 32nd bit symbol, then the comparison may not be valid > + here. This will affect the result of the operation. */ > + > + if (TARGET_64BIT && REG_P (operands[2]) > + && GET_MODE (operands[2]) == SImode) > + { > + output_asm_insn ("addi.w\t%5,%2,0", operands); > + output_asm_insn ("bne\t%0,%5,2f", operands); > + } > + else > + output_asm_insn ("bne\t%0,%z2,2f", operands); > + > + output_asm_insn ("or%i3\t%5,$zero,%3", operands); > + output_asm_insn ("sc.<amo>\t%5,%1", operands); > + output_asm_insn ("beqz\t%5,1b", operands); > + output_asm_insn ("b\t3f", operands); > + output_asm_insn ("2:", operands); > + output_asm_insn ("%G4", operands); > + output_asm_insn ("3:", operands); > + > + return ""; > } > - [(set (attr "length") (const_int 28))]) > + [(set (attr "length") > + (if_then_else > + (and (match_test "GET_MODE (operands[2]) == SImode") > + (match_test "REG_P (operands[2])")) > + (const_int 32) > + (const_int 28)))]) > > (define_insn "atomic_cas_value_strong<mode>_amcas" > [(set (match_operand:QHWD 0 "register_operand" "=&r") > diff --git a/gcc/testsuite/g++.target/loongarch/atomic-cas-int.C b/gcc/testsuite/g++.target/loongarch/atomic-cas-int.C > new file mode 100644 > index 00000000000..830ce48267a > --- /dev/null > +++ b/gcc/testsuite/g++.target/loongarch/atomic-cas-int.C > @@ -0,0 +1,32 @@ > +/* { dg-do run } */ > +/* { dg-options "-O2" } */ > + > +#include <atomic> > +#include <cstdio> > + > +__attribute__ ((noinline)) long > +val_without_const_folding (long val) > +{ > + return val; > +} > + > +int > +main () > +{ > + int oldval = 0xaa; > + int newval = 0xbb; > + std::atomic<int> amo; > + > + amo.store (oldval); > + > + long longval = val_without_const_folding (0xff80000000000000 + oldval); > + oldval = static_cast<int> (longval); > + > + amo.compare_exchange_strong (oldval, newval); > + > + if (newval != amo.load (std::memory_order_relaxed)) > + __builtin_abort (); > + > + return 0; > +} > +
在 2024/3/9 上午9:48, chenglulu 写道: > Pushed to r14-9407. Cherry picked to r13-8413 and r12-10200. > > 在 2024/3/7 上午9:12, Lulu Cheng 写道: >> If the hardware does not support LAMCAS, atomic_compare_and_swapsi >> needs to be >> implemented through "ll.w+sc.w". In the implementation of the >> instruction sequence, >> it is necessary to determine whether the two registers are equal. >> Since LoongArch's comparison instructions do not distinguish between >> 32-bit >> and 64-bit, the two operand registers that need to be compared are >> symbolically >> extended, and one of the operand registers is obtained from memory >> through the >> "ll.w" instruction, which can ensure that the symbolic expansion is >> carried out. >> However, the value of the other operand register is not guaranteed to >> be the >> value of the sign extension. >> >> gcc/ChangeLog: >> >> * config/loongarch/sync.md (atomic_cas_value_strong<mode>): >> In loongarch64, a sign extension operation is added when >> operands[2] is a register operand and the mode is SImode. >> >> gcc/testsuite/ChangeLog: >> >> * g++.target/loongarch/atomic-cas-int.C: New test. >> --- >> gcc/config/loongarch/sync.md | 46 ++++++++++++++----- >> .../g++.target/loongarch/atomic-cas-int.C | 32 +++++++++++++ >> 2 files changed, 67 insertions(+), 11 deletions(-) >> create mode 100644 gcc/testsuite/g++.target/loongarch/atomic-cas-int.C >> >> diff --git a/gcc/config/loongarch/sync.md b/gcc/config/loongarch/sync.md >> index 8f35a5b48d2..d41c2d26811 100644 >> --- a/gcc/config/loongarch/sync.md >> +++ b/gcc/config/loongarch/sync.md >> @@ -245,18 +245,42 @@ (define_insn "atomic_cas_value_strong<mode>" >> (clobber (match_scratch:GPR 5 "=&r"))] >> "" >> { >> - return "1:\\n\\t" >> - "ll.<amo>\\t%0,%1\\n\\t" >> - "bne\\t%0,%z2,2f\\n\\t" >> - "or%i3\\t%5,$zero,%3\\n\\t" >> - "sc.<amo>\\t%5,%1\\n\\t" >> - "beqz\\t%5,1b\\n\\t" >> - "b\\t3f\\n\\t" >> - "2:\\n\\t" >> - "%G4\\n\\t" >> - "3:\\n\\t"; >> + output_asm_insn ("1:", operands); >> + output_asm_insn ("ll.<amo>\t%0,%1", operands); >> + >> + /* Like the test case atomic-cas-int.C, in loongarch64, O1 and >> higher, the >> + return value of the val_without_const_folding will not be >> truncated and >> + will be passed directly to the function compare_exchange_strong. >> + However, the instruction 'bne' does not distinguish between >> 32-bit and >> + 64-bit operations. so if the upper 32 bits of the register are >> not >> + extended by the 32nd bit symbol, then the comparison may not be >> valid >> + here. This will affect the result of the operation. */ >> + >> + if (TARGET_64BIT && REG_P (operands[2]) >> + && GET_MODE (operands[2]) == SImode) >> + { >> + output_asm_insn ("addi.w\t%5,%2,0", operands); >> + output_asm_insn ("bne\t%0,%5,2f", operands); >> + } >> + else >> + output_asm_insn ("bne\t%0,%z2,2f", operands); >> + >> + output_asm_insn ("or%i3\t%5,$zero,%3", operands); >> + output_asm_insn ("sc.<amo>\t%5,%1", operands); >> + output_asm_insn ("beqz\t%5,1b", operands); >> + output_asm_insn ("b\t3f", operands); >> + output_asm_insn ("2:", operands); >> + output_asm_insn ("%G4", operands); >> + output_asm_insn ("3:", operands); >> + >> + return ""; >> } >> - [(set (attr "length") (const_int 28))]) >> + [(set (attr "length") >> + (if_then_else >> + (and (match_test "GET_MODE (operands[2]) == SImode") >> + (match_test "REG_P (operands[2])")) >> + (const_int 32) >> + (const_int 28)))]) >> (define_insn "atomic_cas_value_strong<mode>_amcas" >> [(set (match_operand:QHWD 0 "register_operand" "=&r") >> diff --git a/gcc/testsuite/g++.target/loongarch/atomic-cas-int.C >> b/gcc/testsuite/g++.target/loongarch/atomic-cas-int.C >> new file mode 100644 >> index 00000000000..830ce48267a >> --- /dev/null >> +++ b/gcc/testsuite/g++.target/loongarch/atomic-cas-int.C >> @@ -0,0 +1,32 @@ >> +/* { dg-do run } */ >> +/* { dg-options "-O2" } */ >> + >> +#include <atomic> >> +#include <cstdio> >> + >> +__attribute__ ((noinline)) long >> +val_without_const_folding (long val) >> +{ >> + return val; >> +} >> + >> +int >> +main () >> +{ >> + int oldval = 0xaa; >> + int newval = 0xbb; >> + std::atomic<int> amo; >> + >> + amo.store (oldval); >> + >> + long longval = val_without_const_folding (0xff80000000000000 + >> oldval); >> + oldval = static_cast<int> (longval); >> + >> + amo.compare_exchange_strong (oldval, newval); >> + >> + if (newval != amo.load (std::memory_order_relaxed)) >> + __builtin_abort (); >> + >> + return 0; >> +} >> +
diff --git a/gcc/config/loongarch/sync.md b/gcc/config/loongarch/sync.md index 8f35a5b48d2..d41c2d26811 100644 --- a/gcc/config/loongarch/sync.md +++ b/gcc/config/loongarch/sync.md @@ -245,18 +245,42 @@ (define_insn "atomic_cas_value_strong<mode>" (clobber (match_scratch:GPR 5 "=&r"))] "" { - return "1:\\n\\t" - "ll.<amo>\\t%0,%1\\n\\t" - "bne\\t%0,%z2,2f\\n\\t" - "or%i3\\t%5,$zero,%3\\n\\t" - "sc.<amo>\\t%5,%1\\n\\t" - "beqz\\t%5,1b\\n\\t" - "b\\t3f\\n\\t" - "2:\\n\\t" - "%G4\\n\\t" - "3:\\n\\t"; + output_asm_insn ("1:", operands); + output_asm_insn ("ll.<amo>\t%0,%1", operands); + + /* Like the test case atomic-cas-int.C, in loongarch64, O1 and higher, the + return value of the val_without_const_folding will not be truncated and + will be passed directly to the function compare_exchange_strong. + However, the instruction 'bne' does not distinguish between 32-bit and + 64-bit operations. so if the upper 32 bits of the register are not + extended by the 32nd bit symbol, then the comparison may not be valid + here. This will affect the result of the operation. */ + + if (TARGET_64BIT && REG_P (operands[2]) + && GET_MODE (operands[2]) == SImode) + { + output_asm_insn ("addi.w\t%5,%2,0", operands); + output_asm_insn ("bne\t%0,%5,2f", operands); + } + else + output_asm_insn ("bne\t%0,%z2,2f", operands); + + output_asm_insn ("or%i3\t%5,$zero,%3", operands); + output_asm_insn ("sc.<amo>\t%5,%1", operands); + output_asm_insn ("beqz\t%5,1b", operands); + output_asm_insn ("b\t3f", operands); + output_asm_insn ("2:", operands); + output_asm_insn ("%G4", operands); + output_asm_insn ("3:", operands); + + return ""; } - [(set (attr "length") (const_int 28))]) + [(set (attr "length") + (if_then_else + (and (match_test "GET_MODE (operands[2]) == SImode") + (match_test "REG_P (operands[2])")) + (const_int 32) + (const_int 28)))]) (define_insn "atomic_cas_value_strong<mode>_amcas" [(set (match_operand:QHWD 0 "register_operand" "=&r") diff --git a/gcc/testsuite/g++.target/loongarch/atomic-cas-int.C b/gcc/testsuite/g++.target/loongarch/atomic-cas-int.C new file mode 100644 index 00000000000..830ce48267a --- /dev/null +++ b/gcc/testsuite/g++.target/loongarch/atomic-cas-int.C @@ -0,0 +1,32 @@ +/* { dg-do run } */ +/* { dg-options "-O2" } */ + +#include <atomic> +#include <cstdio> + +__attribute__ ((noinline)) long +val_without_const_folding (long val) +{ + return val; +} + +int +main () +{ + int oldval = 0xaa; + int newval = 0xbb; + std::atomic<int> amo; + + amo.store (oldval); + + long longval = val_without_const_folding (0xff80000000000000 + oldval); + oldval = static_cast<int> (longval); + + amo.compare_exchange_strong (oldval, newval); + + if (newval != amo.load (std::memory_order_relaxed)) + __builtin_abort (); + + return 0; +} +