Message ID | CABu31nP1vUUrmjDSdyE4LwjX0=7rOuect00E9UZLv7C3H3_58g@mail.gmail.com |
---|---|
State | New |
Headers | show |
On Tue, Jul 10, 2012 at 8:20 AM, Steven Bosscher <stevenb.gcc@gmail.com> wrote: > Hello, > > These look like typos: > > * "power4-store-update" wants "iuX,iuY" for X=1|2 and Y=1|2. The > "iu2,iu1" case appeared twice. > * "power4-three" wants "iuX,iuX,iuY|iuX,iuY,iuY" for X=1|2 and Y=1|2. > The "iu1,iu1,iu2" case appeared twice. > > Bootstrapped&tested on powerpc64-unknown-linux-gnu. > OK for trunk? > > Note, it'd be nice if the size of the power4iu automaton could be > reduced somehow. It is by far the largest automaton in the rs6000 back > end, accounting for ~40% of the total. Steven, The change to the power3-three reservation is okay. The change to power4-store-update looks incorrect or at least incomplete. These reservations and others were changed by Vlad in March/April 2004 to fix a consistency check that he introduced at the time. Note that the dispatch units for the final choice also is wrong and duplicated from the previous line for power4-store-update: |(du3_power4+du4_power4,lsu2_power4)\ |(du3_power4+du4_power4,lsu2_power4))+\ It looks like the final line should be du4_power4+du1_power4,lsu1_power4 On POWER4, the dispatch slot forces specific function units. I don't remember and don't have the documents handy to know if the function units for store-update should be 2-1 2-2 1-1 1-2 or 2-1 2-2 1-2 1-1 I think it is the latter. If you look at the thread from 2009, we discuss that the POWER4 scheduler description already is an approximation because an accurate description creates an unreasonably large automata. A lot of the problem is the 1 cycle delay for dependent integer ops. Thanks, David
On 07/17/2012 12:51 PM, David Edelsohn wrote: > The change to power4-store-update looks incorrect or at least incomplete. > > These reservations and others were changed by Vlad in March/April 2004 > to fix a consistency check that he introduced at the time. Note that > the dispatch units for the final choice also is wrong and duplicated > from the previous line for power4-store-update: > > |(du3_power4+du4_power4,lsu2_power4)\ > |(du3_power4+du4_power4,lsu2_power4))+\ > > It looks like the final line should be > > du4_power4+du1_power4,lsu1_power4 This 4th line should be removed altogether. If only the 4th dispatch slot is available and a cracked (or microcoded) insn is next, a nop will fill that slot and the cracked insn will start a new dispatch group. > > On POWER4, the dispatch slot forces specific function units. I don't > remember and don't have the documents handy to know if the function > units for store-update should be > > 2-1 > 2-2 > 1-1 > 1-2 > > or > > 2-1 > 2-2 > 1-2 > 1-1 > > I think it is the latter. On Power4, the dispatch to issue/execution unit affinity is as follows (4 dispatch slots, 2 FXU/LSU issue/execution units): 1 -> 1 2 -> 2 3 -> 2 4 -> 1 Update form stores are cracked into a store and addi, and the store is dual issued to the LSU/FXU. So based on this info I would think it should look like the following (untested). @@ -141,12 +141,10 @@ (define_insn_reservation "power4-store-u (eq_attr "cpu" "power4")) "((du1_power4+du2_power4,lsu1_power4)\ |(du2_power4+du3_power4,lsu2_power4)\ - |(du3_power4+du4_power4,lsu2_power4)\ |(du3_power4+du4_power4,lsu2_power4))+\ - ((nothing,iu2_power4,iu1_power4)\ + ((nothing,iu1_power4,iu2_power4)\ |(nothing,iu2_power4,iu2_power4)\ - |(nothing,iu1_power4,iu2_power4)\ - |(nothing,iu1_power4,iu2_power4))") + |(nothing,iu2_power4,iu1_power4))") -Pat > > If you look at the thread from 2009, we discuss that the POWER4 > scheduler description already is an approximation because an accurate > description creates an unreasonably large automata. A lot of the > problem is the 1 cycle delay for dependent integer ops.
On Thu, Jul 19, 2012 at 2:43 PM, Pat Haugen <pthaugen@linux.vnet.ibm.com> wrote: > Update form stores are cracked into a store and addi, and the store is dual > issued to the LSU/FXU. So based on this info I would think it should look > like the following (untested). > > @@ -141,12 +141,10 @@ (define_insn_reservation "power4-store-u > (eq_attr "cpu" "power4")) > "((du1_power4+du2_power4,lsu1_power4)\ > |(du2_power4+du3_power4,lsu2_power4)\ > - |(du3_power4+du4_power4,lsu2_power4)\ > |(du3_power4+du4_power4,lsu2_power4))+\ > - ((nothing,iu2_power4,iu1_power4)\ > + ((nothing,iu1_power4,iu2_power4)\ > |(nothing,iu2_power4,iu2_power4)\ > - |(nothing,iu1_power4,iu2_power4)\ > - |(nothing,iu1_power4,iu2_power4))") > + |(nothing,iu2_power4,iu1_power4))") Looks good. Check it in after testing. Thanks, David
On 07/20/2012 03:34 PM, David Edelsohn wrote: > Looks good. Check it in after testing. > Testing went fine, patch checked in. -Pat
Index: config/rs6000/power4.md =================================================================== --- config/rs6000/power4.md (revision 189388) +++ config/rs6000/power4.md (working copy) @@ -145,7 +145,7 @@ |(du3_power4+du4_power4,lsu2_power4))+\ ((nothing,iu2_power4,iu1_power4)\ |(nothing,iu2_power4,iu2_power4)\ - |(nothing,iu1_power4,iu2_power4)\ + |(nothing,iu1_power4,iu1_power4)\ |(nothing,iu1_power4,iu2_power4))") (define_insn_reservation "power4-store-update-indexed" 12 @@ -212,7 +212,7 @@ ((iu1_power4,nothing,iu2_power4,nothing,iu2_power4)\ |(iu2_power4,nothing,iu2_power4,nothing,iu1_power4)\ |(iu2_power4,nothing,iu1_power4,nothing,iu1_power4)\ - |(iu1_power4,nothing,iu2_power4,nothing,iu2_power4))") + |(iu1_power4,nothing,iu1_power4,nothing,iu2_power4))") (define_insn_reservation "power4-insert" 4 (and (eq_attr "type" "insert_word")