Message ID | 20230830215708.369610-1-vineetg@rivosinc.com |
---|---|
State | New |
Headers | show |
Series | RISC-V: zicond: remove bogus opt2 pattern | expand |
On 8/30/23 15:57, Vineet Gupta wrote: > This was tripping up gcc.c-torture/execute/pr60003.c at -O1 since the > pattern semantics can't be expressed by zicond instructions. > > This involves test code snippet: > > if (a == 0) > return 0; > else > return x; > } > > which is equivalent to: "x = (a != 0) ? x : a" Isn't it x = (a == 0) ? 0 : x Which seems like it ought to fit zicond just fine. If we take yours; x = (a != 0) ? x : a And simplify with the known value of a on the false arm we get: x = (a != 0 ) ? x : 0; Which is equivalent to x = (a == 0) ? 0 : x; So ISTM this does fit zicond just fine. > > and matches define_insn "*czero.nez.<GPR:mode><X:mode>.opt2" > > | (insn 41 20 38 3 (set (reg/v:DI 136 [ x ]) > | (if_then_else:DI (ne (reg/v:DI 134 [ a ]) > | (const_int 0 [0])) > | (reg/v:DI 136 [ x ]) > | (reg/v:DI 134 [ a ]))) {*czero.nez.didi.opt2} > > The corresponding asm pattern generates > czero.nez x, x, a ; %0, %2, %1 > implying > "x = (a != 0) ? 0 : a" I get this from the RTL pattern: x = (a != 0) ? x : a x = (a != 0) ? x : 0 I think you got the arms reversed. > > which is not what the pattern semantics are. > > Essentially "(a != 0) ? x : a" cannot be expressed with CZERO.nez Agreed, but I think you goof'd earlier :-) Jeff
On 8/31/23 06:51, Jeff Law wrote: > > > On 8/30/23 15:57, Vineet Gupta wrote: >> This was tripping up gcc.c-torture/execute/pr60003.c at -O1 since the >> pattern semantics can't be expressed by zicond instructions. >> >> This involves test code snippet: >> >> if (a == 0) >> return 0; >> else >> return x; >> } >> >> which is equivalent to: "x = (a != 0) ? x : a" > Isn't it > > x = (a == 0) ? 0 : x > > Which seems like it ought to fit zicond just fine. Logically they are equivalent, but .... > > If we take yours; > > x = (a != 0) ? x : a > > And simplify with the known value of a on the false arm we get: > > x = (a != 0 ) ? x : 0; > > Which is equivalent to > > x = (a == 0) ? 0 : x; > > So ISTM this does fit zicond just fine. I could very well be mistaken, but define_insn is a pattern match and opt2 has *ne* so the expression has to be in != form and thus needs to work with that condition. No ? >> and matches define_insn "*czero.nez.<GPR:mode><X:mode>.opt2" >> >> | (insn 41 20 38 3 (set (reg/v:DI 136 [ x ]) >> | (if_then_else:DI (ne (reg/v:DI 134 [ a ]) >> | (const_int 0 [0])) >> | (reg/v:DI 136 [ x ]) >> | (reg/v:DI 134 [ a ]))) {*czero.nez.didi.opt2} >> >> The corresponding asm pattern generates >> czero.nez x, x, a ; %0, %2, %1 >> implying >> "x = (a != 0) ? 0 : a" > I get this from the RTL pattern: > > x = (a != 0) ? x : a > x = (a != 0) ? x : 0 This is the issue, for ne, czero.nez can only express x = (a != 0) ? 0 : x > > I think you got the arms reversed. What I meant was czero.nez as specified in RTL pattern would generate x = (a != 0) ? 0 : a, whereas pattern's desired semantics is (a != 0) ? x : 0 And that is a problem because after all equivalents/simplifications, a ternary operation's middle operand has to be zero to map to czero*, but it doesn't for the opt2 RTL semantics. I've sat on this for 2 days, trying to convince myself I was wrong, but as it stands, it was generating wrong code in the test which is fixed after the patch. Thx, -Vineet
On 8/31/23 11:57, Vineet Gupta wrote: > > > On 8/31/23 06:51, Jeff Law wrote: >> >> >> On 8/30/23 15:57, Vineet Gupta wrote: >>> This was tripping up gcc.c-torture/execute/pr60003.c at -O1 since the >>> pattern semantics can't be expressed by zicond instructions. >>> >>> This involves test code snippet: >>> >>> if (a == 0) >>> return 0; >>> else >>> return x; >>> } >>> >>> which is equivalent to: "x = (a != 0) ? x : a" >> Isn't it >> >> x = (a == 0) ? 0 : x >> >> Which seems like it ought to fit zicond just fine. > > Logically they are equivalent, but .... > >> >> If we take yours; >> >> x = (a != 0) ? x : a >> >> And simplify with the known value of a on the false arm we get: >> >> x = (a != 0 ) ? x : 0; >> >> Which is equivalent to >> >> x = (a == 0) ? 0 : x; >> >> So ISTM this does fit zicond just fine. > > I could very well be mistaken, but define_insn is a pattern match and > opt2 has *ne* so the expression has to be in != form and thus needs to > work with that condition. No ? My point was that x = (a != 0) ? x : 0 is equivalent to x = (a == 0) ? 0 : x You can invert the condition and swap the arms and get the same semantics. Thus if one can be supported, so can the other as they're functionally equivalent. It may be the at we've goof'd something in handling the inverted case, but conceptually we ought to be able to handle both. I don't doubt you've got a failure, but it's also the case that I'm not seeing the same failure when I turn on zicond and run the execute.exp tests. So clearly there's a difference somewhere in what we're doing. So perhaps we should start with comparing assembly output for the test in question. Can you pass yours along, I'll diff them this afternoon and see what we find. jeff
On Thu, 31 Aug 2023 10:57:52 PDT (-0700), Vineet Gupta wrote: > > > On 8/31/23 06:51, Jeff Law wrote: >> >> >> On 8/30/23 15:57, Vineet Gupta wrote: >>> This was tripping up gcc.c-torture/execute/pr60003.c at -O1 since the >>> pattern semantics can't be expressed by zicond instructions. >>> >>> This involves test code snippet: >>> >>> if (a == 0) >>> return 0; >>> else >>> return x; >>> } >>> >>> which is equivalent to: "x = (a != 0) ? x : a" >> Isn't it >> >> x = (a == 0) ? 0 : x >> >> Which seems like it ought to fit zicond just fine. > > Logically they are equivalent, but .... > >> >> If we take yours; >> >> x = (a != 0) ? x : a >> >> And simplify with the known value of a on the false arm we get: >> >> x = (a != 0 ) ? x : 0; >> >> Which is equivalent to >> >> x = (a == 0) ? 0 : x; >> >> So ISTM this does fit zicond just fine. > > I could very well be mistaken, but define_insn is a pattern match and > opt2 has *ne* so the expression has to be in != form and thus needs to > work with that condition. No ? > >>> and matches define_insn "*czero.nez.<GPR:mode><X:mode>.opt2" >>> >>> | (insn 41 20 38 3 (set (reg/v:DI 136 [ x ]) >>> | (if_then_else:DI (ne (reg/v:DI 134 [ a ]) >>> | (const_int 0 [0])) >>> | (reg/v:DI 136 [ x ]) >>> | (reg/v:DI 134 [ a ]))) {*czero.nez.didi.opt2} >>> >>> The corresponding asm pattern generates >>> czero.nez x, x, a ; %0, %2, %1 >>> implying >>> "x = (a != 0) ? 0 : a" >> I get this from the RTL pattern: >> >> x = (a != 0) ? x : a >> x = (a != 0) ? x : 0 > > This is the issue, for ne, czero.nez can only express > x = (a != 0) ? 0 : x > >> >> I think you got the arms reversed. Just working through this in email, as there's a lot of double-negatives and I managed to screw up my Linux PR this morning so I may not be thinking that well... The docs say "(if_then_else test true-value false-value)". So in this case it's test: (ne (match_operand:X 1 "register_operand" "r") (const_int 0)) true: (match_operand:GPR 2 "register_operand" "r") false: (match_operand:GPR 3 "register_operand" "1") == (match_operand:X 1 "register_operand" "r") and we're encoding it as czero.nez %0,%2,%1 so that's rd: output rs1: on-true rs2: condition (the value inside the ne in RTL) That looks correct to me: the instruction's condition source register is inside a "(ne ... 0)", but we're doing the cmov.nez so it looks OK. The rest of the zero juggling looks sane as well -- I'm not sure if the X vs GPR mismatch will confuse something else, but it should be caught by the rtx_equal_p() and thus should at least be safe. > What I meant was czero.nez as specified in RTL pattern would generate x > = (a != 0) ? 0 : a, whereas pattern's desired semantics is (a != 0) ? x : 0 > And that is a problem because after all equivalents/simplifications, a > ternary operation's middle operand has to be zero to map to czero*, but > it doesn't for the opt2 RTL semantics. > > I've sat on this for 2 days, trying to convince myself I was wrong, but > as it stands, it was generating wrong code in the test which is fixed > after the patch. It might be easier for everyone to understand if you add a specific testcase for just the broken codegen. I'm not having luck constructing a small reproducer (though I don't have a clean tree lying around, so I might have screwed something up here). IIUC something like long func(long x, long a) { if (a != 0) return x; return 0; } should do it, but I'm getting func: czero.eqz a0,a0,a1 ret which looks right to me -- though it's not triggering this pattern, so not sure that means much. > > Thx, > -Vineet
On 9/1/23 06:13, Jeff Law wrote: >> I could very well be mistaken, but define_insn is a pattern match and >> opt2 has *ne* so the expression has to be in != form and thus needs >> to work with that condition. No ? > My point was that > > x = (a != 0) ? x : 0 > > is equivalent to > > x = (a == 0) ? 0 : x > > You can invert the condition and swap the arms and get the same > semantics. Thus if one can be supported, so can the other as they're > functionally equivalent. Ah I see what you mean. Indeed the pattern is fine, it just doesn't map to the right asm. So we certainly need a fix but it could just very well be this: (define_insn "*czero.nez.<GPR:mode><X:mode>.opt2" [(set (match_operand:GPR 0 "register_operand" "=r") (if_then_else:GPR (ne (match_operand:X 1 "register_operand" "r") (const_int 0)) (match_operand:GPR 2 "register_operand" "r") (match_operand:GPR 3 "register_operand" "1")))] "TARGET_ZICOND && rtx_equal_p (operands[1], operands[3])" - "czero.nez\t%0,%2,%1" } "czero.eqz\t%0,%2,%1" ) > It may be the at we've goof'd something in handling the inverted case, > but conceptually we ought to be able to handle both. Indeed there's a small goof as shown above. > > I don't doubt you've got a failure, but it's also the case that I'm > not seeing the same failure when I turn on zicond and run the > execute.exp tests. So clearly there's a difference somewhere in what > we're doing. It doesn't show up in execute.exp but as following (perhaps I should add that to commit log too). FAIL: gcc.c-torture/execute/pr60003.c -O1 execution test FAIL: gcc.dg/setjmp-3.c execution test FAIL: gcc.dg/torture/stackalign/setjmp-3.c -O1 execution test FAIL: gcc.dg/torture/stackalign/setjmp-3.c -O1 -fpic execution test > > So perhaps we should start with comparing assembly output for the test > in question. Can you pass yours along, I'll diff them this afternoon > and see what we find. Attached is slightly modified pr60003.c (to differentiate 'X' and 'a') and the failing asm and with fix (both the deleted pattern and modified pattern produce correct, if slightly different code). Thx, -Vineet .file "pr60003.c" .option nopic # GNU C17 (GCC) version 14.0.0 20230830 (experimental) (riscv-unknown-elf) # compiled by GNU C version 11.4.0, GMP version 6.1.0, MPFR version 3.1.4, MPC version 1.0.3, isl version isl-0.18-GMP # GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 # options passed: -march=rv64gc_zba_zbb_zbs_zicond -mabi=lp64d -O1 .text .align 1 .globl baz .type baz, @function baz: addi sp,sp,-16 #,, sd ra,8(sp) #, sd s0,0(sp) #, addi s0,sp,16 #,, # pr60003.c:11: __builtin_longjmp (&jmp_buf, 1); lui a5,%hi(.LANCHOR0) # tmp135, addi a5,a5,%lo(.LANCHOR0) # tmp134, tmp135, ld a4,8(a5) # tmp136, ld a3,0(a5) # tmp137, ld sp,16(a5) #, mv s0,a3 #, tmp137 jr a4 # tmp136 .size baz, .-baz .align 1 .globl bar .type bar, @function bar: addi sp,sp,-16 #,, sd ra,8(sp) #, # pr60003.c:17: baz (); call baz # .size bar, .-bar .align 1 .globl foo .type foo, @function foo: addi sp,sp,-224 #,, sd ra,216(sp) #, sd s0,208(sp) #, sd s1,200(sp) #, sd s2,192(sp) #, sd s3,184(sp) #, sd s4,176(sp) #, sd s5,168(sp) #, sd s6,160(sp) #, sd s7,152(sp) #, sd s8,144(sp) #, sd s9,136(sp) #, sd s10,128(sp) #, sd s11,120(sp) #, fsd fs0,104(sp) #, fsd fs1,96(sp) #, fsd fs2,88(sp) #, fsd fs3,80(sp) #, fsd fs4,72(sp) #, fsd fs5,64(sp) #, fsd fs6,56(sp) #, fsd fs7,48(sp) #, fsd fs8,40(sp) #, fsd fs9,32(sp) #, fsd fs10,24(sp) #, fsd fs11,16(sp) #, sd a0,8(sp) # tmp142, %sfp # pr60003.c:25: if (__builtin_setjmp (&jmp_buf) == 0) lui a5,%hi(.LANCHOR0) # tmp138, addi a5,a5,%lo(.LANCHOR0) # sd s0,0(a5) # ^^^^^^^ orig X (2) ^^^^^^^^^ lui a4,%hi(.L6) # addi a4,a4,%lo(.L6) # sd a4,8(a5) # sd sp,16(a5) #, # pr60003.c:17: baz (); call baz # .L6: # pr60003.c:25: if (__builtin_setjmp (&jmp_buf) == 0) # pr60003.c:40: } li a5,1 # ^^^^ (1) ^^^^^^^^^ ld a4,8(sp) # ^^^^ orig X (2) ^^^^^^^^^ czero.nez a0,a4,a5 # ^^^^ BUG since both ops are non zero ld ra,216(sp) #, ld s0,208(sp) #, ld s1,200(sp) #, ld s2,192(sp) #, ld s3,184(sp) #, ld s4,176(sp) #, ld s5,168(sp) #, ld s6,160(sp) #, ld s7,152(sp) #, ld s8,144(sp) #, ld s9,136(sp) #, ld s10,128(sp) #, ld s11,120(sp) #, fld fs0,104(sp) #, fld fs1,96(sp) #, fld fs2,88(sp) #, fld fs3,80(sp) #, fld fs4,72(sp) #, fld fs5,64(sp) #, fld fs6,56(sp) #, fld fs7,48(sp) #, fld fs8,40(sp) #, fld fs9,32(sp) #, fld fs10,24(sp) #, fld fs11,16(sp) #, addi sp,sp,224 #,, jr ra # .size foo, .-foo .align 1 .globl main .type main, @function main: addi sp,sp,-16 #,, sd ra,8(sp) #, # pr60003.c:45: if (foo (2) == 0) // orig test has foo (1) li a0,2 #, call foo # # pr60003.c:49: } seqz a0,a0 #, tmp143 ld ra,8(sp) #, addi sp,sp,16 #,, jr ra # .size main, .-main .globl jmp_buf .bss .align 3 .set .LANCHOR0,. + 0 .type jmp_buf, @object .size jmp_buf, 40 jmp_buf: .zero 40 .ident "GCC: (GNU) 14.0.0 20230830 (experimental)" .file "pr60003.c" .option nopic .text .align 1 .globl baz .type baz, @function baz: addi sp,sp,-16 sd ra,8(sp) sd s0,0(sp) addi s0,sp,16 lui a5,%hi(.LANCHOR0) addi a5,a5,%lo(.LANCHOR0) ld a4,8(a5) ld a3,0(a5) ld sp,16(a5) mv s0,a3 jr a4 .size baz, .-baz .align 1 .globl bar .type bar, @function bar: addi sp,sp,-16 sd ra,8(sp) call baz .size bar, .-bar .align 1 .globl foo .type foo, @function foo: addi sp,sp,-224 sd ra,216(sp) sd s0,208(sp) sd s1,200(sp) sd s2,192(sp) sd s3,184(sp) sd s4,176(sp) sd s5,168(sp) sd s6,160(sp) sd s7,152(sp) sd s8,144(sp) sd s9,136(sp) sd s10,128(sp) sd s11,120(sp) fsd fs0,104(sp) fsd fs1,96(sp) fsd fs2,88(sp) fsd fs3,80(sp) fsd fs4,72(sp) fsd fs5,64(sp) fsd fs6,56(sp) fsd fs7,48(sp) fsd fs8,40(sp) fsd fs9,32(sp) fsd fs10,24(sp) fsd fs11,16(sp) sd a0,8(sp) lui a5,%hi(.LANCHOR0) addi a5,a5,%lo(.LANCHOR0) sd s0,0(a5) lui a4,%hi(.L6) addi a4,a4,%lo(.L6) sd a4,8(a5) sd sp,16(a5) call baz .L6: li a5,1 ld a4,8(sp) czero.eqz a0,a4,a5 ld ra,216(sp) ld s0,208(sp) ld s1,200(sp) ld s2,192(sp) ld s3,184(sp) ld s4,176(sp) ld s5,168(sp) ld s6,160(sp) ld s7,152(sp) ld s8,144(sp) ld s9,136(sp) ld s10,128(sp) ld s11,120(sp) fld fs0,104(sp) fld fs1,96(sp) fld fs2,88(sp) fld fs3,80(sp) fld fs4,72(sp) fld fs5,64(sp) fld fs6,56(sp) fld fs7,48(sp) fld fs8,40(sp) fld fs9,32(sp) fld fs10,24(sp) fld fs11,16(sp) addi sp,sp,224 jr ra .size foo, .-foo .align 1 .globl main .type main, @function main: addi sp,sp,-16 sd ra,8(sp) li a0,2 call foo seqz a0,a0 ld ra,8(sp) addi sp,sp,16 jr ra .size main, .-main .globl jmp_buf .bss .align 3 .set .LANCHOR0,. + 0 .type jmp_buf, @object .size jmp_buf, 40 jmp_buf: .zero 40 .ident "GCC: (GNU) 14.0.0 20230825 (experimental)"
On 9/1/23 10:40, Palmer Dabbelt wrote: > > Just working through this in email, as there's a lot of > double-negatives and I managed to screw up my Linux PR this morning so > I may not be thinking that well... > > The docs say "(if_then_else test true-value false-value)". So in this > case it's > > test: (ne (match_operand:X 1 "register_operand" "r") (const_int 0)) > true: (match_operand:GPR 2 "register_operand" "r") > false: (match_operand:GPR 3 "register_operand" "1") == > (match_operand:X 1 "register_operand" "r") > > and we're encoding it as > > czero.nez %0,%2,%1 > > so that's > > rd: output > rs1: on-true > rs2: condition (the value inside the ne in RTL) > > That looks correct to me: the instruction's condition source register > is inside a "(ne ... 0)", but we're doing the cmov.nez so it looks OK. Yes it is fine, until you end up having both operand 2 and operand 3 have non-zero values at runtime and somehow match this pattern Then the semantics of czero* are not honored. > It might be easier for everyone to understand if you add a specific > testcase for just the broken codegen. I'm not having luck > constructing a small reproducer (though I don't have a clean tree > lying around, so I might have screwed something up here). > > IIUC something like > > long func(long x, long a) { > if (a != 0) > return x; > return 0; > } > > should do it, but I'm getting > > func: > czero.eqz a0,a0,a1 > ret Unfortunately tests any simpler don't trigger it - they code seqs just get optimized away - otherwise Jeff would have found this 3 weeks ago ;-) Just use gcc/testsuite/gcc.c-torture/execute/pr60003.c Thx, -Vineet
diff --git a/gcc/config/riscv/zicond.md b/gcc/config/riscv/zicond.md index 25f21d33487e..aa5607a9efd8 100644 --- a/gcc/config/riscv/zicond.md +++ b/gcc/config/riscv/zicond.md @@ -52,13 +52,3 @@ "TARGET_ZICOND && rtx_equal_p (operands[1], operands[2])" "czero.eqz\t%0,%3,%1" ) - -(define_insn "*czero.nez.<GPR:mode><X:mode>.opt2" - [(set (match_operand:GPR 0 "register_operand" "=r") - (if_then_else:GPR (ne (match_operand:X 1 "register_operand" "r") - (const_int 0)) - (match_operand:GPR 2 "register_operand" "r") - (match_operand:GPR 3 "register_operand" "1")))] - "TARGET_ZICOND && rtx_equal_p (operands[1], operands[3])" - "czero.nez\t%0,%2,%1" -)
This was tripping up gcc.c-torture/execute/pr60003.c at -O1 since the pattern semantics can't be expressed by zicond instructions. This involves test code snippet: if (a == 0) return 0; else return x; } which is equivalent to: "x = (a != 0) ? x : a" and matches define_insn "*czero.nez.<GPR:mode><X:mode>.opt2" | (insn 41 20 38 3 (set (reg/v:DI 136 [ x ]) | (if_then_else:DI (ne (reg/v:DI 134 [ a ]) | (const_int 0 [0])) | (reg/v:DI 136 [ x ]) | (reg/v:DI 134 [ a ]))) {*czero.nez.didi.opt2} The corresponding asm pattern generates czero.nez x, x, a ; %0, %2, %1 implying "x = (a != 0) ? 0 : a" which is not what the pattern semantics are. Essentially "(a != 0) ? x : a" cannot be expressed with CZERO.nez As a side note, while correctness prevails, this test still gets a czero in the end, albeit a different one. if-convert generates two if_then_else | (insn 43 20 44 3 (set (reg:DI 143) | (reg/v:DI 136 [ x ])) "pr60003.c":36:9 179 {*movdi_64bit} | (insn 44 43 46 3 (set (reg:DI 142) | (reg/v:DI 134 [ a ])) "pr60003.c":36:9 179 {*movdi_64bit} | | (insn 46 44 47 3 (set (reg:DI 145) | (if_then_else:DI (ne:DI (reg/v:DI 134 [ a ]) | (const_int 0 [0])) | (const_int 0 [0]) | (reg:DI 142))) "pr60003.c":36:9 14532 {*czero.nez.didi} | | (insn 47 46 48 3 (set (reg:DI 144) | (if_then_else:DI (eq:DI (reg/v:DI 134 [ a ]) | (const_int 0 [0])) | (const_int 0 [0]) | (reg:DI 143))) "pr60003.c":36:9 14531 {*czero.eqz.didi} and combine is able to fuse them together | (insn 38 48 39 3 (set (reg/i:DI 10 a0) | (if_then_else:DI (eq:DI (reg/v:DI 134 [ a ]) | (const_int 0 [0])) | (const_int 0 [0]) | (reg:DI 143))) "pr60003.c":40:1 14531 {*czero.eqz.didi} before fix after fix ----------------- ----------------- li a5,1 li a0,1 ld a4,8(sp) ld a5,8(sp) czero.nez a0,a4,a5 czero.eqz a0,a5,a0 The issue only happens at -O1 as at higher optimization levels, the whole conditional move gets optimized away. gcc/ChangeLog: * config/riscv/zicond.md: Remove incorrect op2 pattern. Fixes: 1d5bc3285e8a ("[committed][RISC-V] Fix 20010221-1.c with zicond") Signed-off-by: Vineet Gupta <vineetg@rivosinc.com> --- gcc/config/riscv/zicond.md | 10 ---------- 1 file changed, 10 deletions(-)