Message ID | cc14c02e-2ea8-17e4-b44f-6991d1a992dc@gmail.com |
---|---|
State | New |
Headers | show |
Series | [committed,RISC-V] Remove errant hunk of code | expand |
On Thu, 03 Aug 2023 08:05:09 PDT (-0700), gcc-patches@gcc.gnu.org wrote: > > I'm using this hunk locally to more thoroughly exercise the zicond paths > due to inaccuracies elsewhere in the costing model. It was never > supposed to be part of the costing commit though. And as we've seen > it's causing problems with the vector bits. > > While my testing isn't complete, this hunk was never supposed to be > pushed and it's causing problems. So I'm just ripping it out. > > There's a bigger TODO in this space WRT a top-to-bottom evaluation of > the costing on RISC-V. I'm still formulating what that evaluation is > going to look like, so don't hold your breath waiting on it. We touched on it a bit in the call a few days ago, but I definately agree that's worth doing: the current cost and pipeline models are from when the ISA was a lot simpler, it's changed a lot over the last few years so we're likely to need a bunch of work here. At a bare minimum there's some refactoring that could be done to make the code saner, it's been through a few rounds of work so there's some cruft. Our rough plan was to get together some microbenchmarks to drive the model. We're using Microprobe for that, Patrick has been updating it to support various new ISA extensions -- I think that's all upstream already. That'll spit out throughput/latency tables that we can build the pipeline/cost models on top of. The hope was to run these on some extant systems so we could start answering questions like that overlapping stores on the C906, but we haven't gotten around to that yet. Everything past microprobe is just ideas right now. The cost/pipeline model related issues on the scalar side look small so most of the worry is on vector. The general theory is that we're going to need a lot of work on vector codegen to get things going fast for us, but it's all very microarchitecture specific. We're not aiming for any of this for GCC 14. So if you guys have time to look that'd be awesome, I don't think anyone over here will conflict with it any time soon -- aside from whatever falls out of bugs and the generic optimization work, but no way around that sort of thing. > Pushed to the trunk. > commit d61efa3cd3378be38738bfb5139925d1505c1325 > Author: Jeff Law <jeffreyalaw@gmail.com> > Date: Thu Aug 3 10:57:23 2023 -0400 > > [committed][RISC-V] Remove errant hunk of code > > I'm using this hunk locally to more thoroughly exercise the zicond paths > due to inaccuracies elsewhere in the costing model. It was never > supposed to be part of the costing commit though. And as we've seen > it's causing problems with the vector bits. > > While my testing isn't complete, this hunk was never supposed to be > pushed and it's causing problems. So I'm just ripping it out. > > There's a bigger TODO in this space WRT a top-to-bottom evaluation of > the costing on RISC-V. I'm still formulating what that evaluation is > going to look like, so don't hold your breath waiting on it. > > Pushed to the trunk. > > gcc/ > > * config/riscv/riscv.cc (riscv_rtx_costs): Remove errant hunk from > recent commit. > > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc > index 9e75450aa97..d8fab68dbb4 100644 > --- a/gcc/config/riscv/riscv.cc > +++ b/gcc/config/riscv/riscv.cc > @@ -2913,16 +2913,6 @@ riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN > } > return false; > > - case SET: > - /* A simple SET with a register destination takes its cost solely from > - the SET_SRC operand. */ > - if (outer_code == INSN && REG_P (SET_DEST (x))) > - { > - *total = riscv_rtx_costs (SET_SRC (x), mode, SET, opno, total, speed); > - return true; > - } > - return false; > - > default: > return false; > }
On 8/3/23 11:41, Palmer Dabbelt wrote: > On Thu, 03 Aug 2023 08:05:09 PDT (-0700), gcc-patches@gcc.gnu.org wrote: >> >> I'm using this hunk locally to more thoroughly exercise the zicond paths >> due to inaccuracies elsewhere in the costing model. It was never >> supposed to be part of the costing commit though. And as we've seen >> it's causing problems with the vector bits. >> >> While my testing isn't complete, this hunk was never supposed to be >> pushed and it's causing problems. So I'm just ripping it out. >> >> There's a bigger TODO in this space WRT a top-to-bottom evaluation of >> the costing on RISC-V. I'm still formulating what that evaluation is >> going to look like, so don't hold your breath waiting on it. > > We touched on it a bit in the call a few days ago, but I definately > agree that's worth doing: the current cost and pipeline models are from > when the ISA was a lot simpler, it's changed a lot over the last few > years so we're likely to need a bunch of work here. At a bare minimum > there's some refactoring that could be done to make the code saner, it's > been through a few rounds of work so there's some cruft. > > Our rough plan was to get together some microbenchmarks to drive the > model. We're using Microprobe for that, Patrick has been updating it to > support various new ISA extensions -- I think that's all upstream > already. That'll spit out throughput/latency tables that we can build > the pipeline/cost models on top of. The hope was to run these on some > extant systems so we could start answering questions like that > overlapping stores on the C906, but we haven't gotten around to that yet. > > Everything past microprobe is just ideas right now. The cost/pipeline > model related issues on the scalar side look small so most of the worry > is on vector. The general theory is that we're going to need a lot of > work on vector codegen to get things going fast for us, but it's all > very microarchitecture specific. We're not aiming for any of this for > GCC 14. > > So if you guys have time to look that'd be awesome, I don't think anyone > over here will conflict with it any time soon -- aside from whatever > falls out of bugs and the generic optimization work, but no way around > that sort of thing. Right. I think we're probably tackling different issues. The problem I'm looking at right now is when we ask for the cost of a sequence of insns, the underlying costs we get per insn are not at all in line with expectations. We have likely either failed to provide one of the desired hook entries or we're missing toplevel constructs like INSN & SET in our riscv_rtx_costs. I'm not sure which it is yet, but there's clearly something wrong in there. IMHO it falls into the "fix the stupid stuff" space. The problem you're starting to chase sounds more like pipeline modeling with perhaps a bit of riscv_rtx_costs work thrown in as well, but more on the pipeline modeling side. Jeff
On 8/3/23 11:12, Jeff Law via Gcc-patches wrote: >> On Thu, 03 Aug 2023 08:05:09 PDT (-0700), gcc-patches@gcc.gnu.org wrote: >>> [...] >>> >>> There's a bigger TODO in this space WRT a top-to-bottom evaluation of >>> the costing on RISC-V. I'm still formulating what that evaluation is >>> going to look like, so don't hold your breath waiting on it. >> >> [...] > Right. I think we're probably tackling different issues. > > The problem I'm looking at right now is when we ask for the cost of a > sequence of insns, the underlying costs we get per insn are not at all > in line with expectations. We have likely either failed to provide > one of the desired hook entries or we're missing toplevel constructs > like INSN & SET in our riscv_rtx_costs. I'm not sure which it is yet, > but there's clearly something wrong in there. IMHO it falls into the > "fix the stupid stuff" space. As discussed in Tue call, I definitely have 1 fix to riscv_rtx_costs (), which is worth pondering. It adjusts the cost of consts and helps Hoist GCSE constants (which granted kicks in only at -Os). However it does affect codegen in subtle ways since CSE1 now for some cases generates additional REG_EQUAL note. Again it might not help your issue. I'll post a RFC patch along with a test case - which in turns comes from an old PR I stumbled upon during research on const remat. On a different but slightly related note, I was playing with Zicond this morning (enabling by default in toolchain build and qemu) and I was surprised to see that glibc build currently doesn't have a single czero* insn - although gcc has bene configured with Point being, I can help with riscv_rtx_costs and friends, if you have something specific, but seems like you are in the thick of it and have that handled so I'll stay out of the way and refrain from zicond work/testing for the time being. Thx, -Vineet
On 8/3/23 16:26, Vineet Gupta wrote: > > As discussed in Tue call, I definitely have 1 fix to riscv_rtx_costs (), > which is worth pondering. It adjusts the cost of consts and helps Hoist > GCSE constants (which granted kicks in only at -Os). However it does > affect codegen in subtle ways since CSE1 now for some cases generates > additional REG_EQUAL note. Yea, we'll definitely want to take a look. It's not likely affecting my work, but definitely want to evaluate as part of the overall costing question. > > On a different but slightly related note, I was playing with Zicond this > morning (enabling by default in toolchain build and qemu) and I was > surprised to see that glibc build currently doesn't have a single czero* > insn - although gcc has bene configured with Probably costing ;-) That little hunk that snuck through is part of how I arrange to get more czero instructions when running the GCC testsuite. Also note that if you're disassembling with binutils, I think it just dumps it out as a .word or something similar. > > Point being, I can help with riscv_rtx_costs and friends, if you have > something specific, but seems like you are in the thick of it and have > that handled so I'll stay out of the way and refrain from zicond > work/testing for the time being. ;-) Actually if you wanted to poke at zicond, the most interesting unexplored area I've come across is the COND_EXPR handling in gimple. When we expand a COND_EXPR into RTL the first approach we take is to try mov<mode>cc in RTL. Unfortunately we don't create COND_EXPRs all that often in gimple. Some simple match.pd patterns would likely really help here. The problem is RTL expansion when mov<mode>cc FAILs is usually poor at best. So if we're going to add those match.pd patterns, we probably need to beef up the RTL expansion code to do a better job when the target doesn't have a mov<mode>cc RTL pattern. Jeff
On 8/3/23 16:15, Jeff Law wrote: > > > On 8/3/23 16:26, Vineet Gupta wrote: >> >> As discussed in Tue call, I definitely have 1 fix to riscv_rtx_costs >> (), which is worth pondering. It adjusts the cost of consts and helps >> Hoist GCSE constants (which granted kicks in only at -Os). However it >> does affect codegen in subtle ways since CSE1 now for some cases >> generates additional REG_EQUAL note. > Yea, we'll definitely want to take a look. It's not likely affecting > my work, but definitely want to evaluate as part of the overall > costing question. OK, I'll spin it and post to the list after a testsuite run. > > >> >> On a different but slightly related note, I was playing with Zicond >> this morning (enabling by default in toolchain build and qemu) and I >> was surprised to see that glibc build currently doesn't have a single >> czero* insn - although gcc has bene configured with > Probably costing ;-) That little hunk that snuck through is part of > how I arrange to get more czero instructions when running the GCC > testsuite. > > Also note that if you're disassembling with binutils, I think it just > dumps it out as a .word or something similar. Yeah something was off, I was building /using the latest binutils but then my fancy over-engineered objdump helper was pointing to an a non-zicond binutils :-) > >> >> Point being, I can help with riscv_rtx_costs and friends, if you have >> something specific, but seems like you are in the thick of it and >> have that handled so I'll stay out of the way and refrain from zicond >> work/testing for the time being. > ;-) Actually if you wanted to poke at zicond, the most interesting > unexplored area I've come across is the COND_EXPR handling in gimple. > When we expand a COND_EXPR into RTL the first approach we take is to > try mov<mode>cc in RTL. > > Unfortunately we don't create COND_EXPRs all that often in gimple. > Some simple match.pd patterns would likely really help here. > > The problem is RTL expansion when mov<mode>cc FAILs is usually poor at > best. So if we're going to add those match.pd patterns, we probably > need to beef up the RTL expansion code to do a better job when the > target doesn't have a mov<mode>cc RTL pattern. Ok, I'll add that to my todo list. Thx, -Vineet
On 8/3/23 17:38, Vineet Gupta wrote: >> ;-) Actually if you wanted to poke at zicond, the most interesting >> unexplored area I've come across is the COND_EXPR handling in gimple. >> When we expand a COND_EXPR into RTL the first approach we take is to >> try mov<mode>cc in RTL. >> >> Unfortunately we don't create COND_EXPRs all that often in gimple. >> Some simple match.pd patterns would likely really help here. >> >> The problem is RTL expansion when mov<mode>cc FAILs is usually poor at >> best. So if we're going to add those match.pd patterns, we probably >> need to beef up the RTL expansion code to do a better job when the >> target doesn't have a mov<mode>cc RTL pattern. > > Ok, I'll add that to my todo list. You might want to reach out to Andrew Pinski if you do poke at this. I made a reference to this issue in a BZ he recently commented on. It was an x86 issue with cmov generation, but the same core issue applies -- we're not generating COND_EXPRs very aggressively in gimple. jeff
On Thu, Aug 3, 2023 at 10:31 PM Jeff Law via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > > > On 8/3/23 17:38, Vineet Gupta wrote: > > >> ;-) Actually if you wanted to poke at zicond, the most interesting > >> unexplored area I've come across is the COND_EXPR handling in gimple. > >> When we expand a COND_EXPR into RTL the first approach we take is to > >> try mov<mode>cc in RTL. > >> > >> Unfortunately we don't create COND_EXPRs all that often in gimple. > >> Some simple match.pd patterns would likely really help here. > >> > >> The problem is RTL expansion when mov<mode>cc FAILs is usually poor at > >> best. So if we're going to add those match.pd patterns, we probably > >> need to beef up the RTL expansion code to do a better job when the > >> target doesn't have a mov<mode>cc RTL pattern. > > > > Ok, I'll add that to my todo list. > You might want to reach out to Andrew Pinski if you do poke at this. I > made a reference to this issue in a BZ he recently commented on. It was > an x86 issue with cmov generation, but the same core issue applies -- > we're not generating COND_EXPRs very aggressively in gimple. Yes I have some ideas of producing more aggressively COND_EXPR in either isel or in the last phiopt. There is also a canonicalization form issue dealing with `bool * b` representing `bool ? b : 0` where isel could select between the COND_EXPR and multiply too. This is the issue Jeff is talking about too. Thanks, Andrew > > jeff
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 9e75450aa97..d8fab68dbb4 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -2913,16 +2913,6 @@ riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN } return false; - case SET: - /* A simple SET with a register destination takes its cost solely from - the SET_SRC operand. */ - if (outer_code == INSN && REG_P (SET_DEST (x))) - { - *total = riscv_rtx_costs (SET_SRC (x), mode, SET, opno, total, speed); - return true; - } - return false; - default: return false; }