Message ID | ZZiTS0adUUPx7wjY@cowardly-lion.the-meissners.org |
---|---|
State | New |
Headers | show |
Series | Repost [PATCH 4/6] PowerPC: Make MMA insns support DMR registers. | expand |
Ping | Date: Fri, 5 Jan 2024 18:39:55 -0500 | From: Michael Meissner <meissner@linux.ibm.com> | Subject: Repost [PATCH 4/6] PowerPC: Make MMA insns support DMR registers. | Message-ID: <ZZiTS0adUUPx7wjY@cowardly-lion.the-meissners.org> https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641964.html
Hi Mike, on 2024/1/6 07:39, Michael Meissner wrote: > This patch changes the MMA instructions to use either FPR registers > (-mcpu=power10) or DMRs (-mcpu=future). In this patch, the existing MMA > instruction names are used. > > A macro (__PPC_DMR__) is defined if the MMA instructions use the DMRs. > > The patches have been tested on both little and big endian systems. Can I check > it into the master branch? > > 2024-01-05 Michael Meissner <meissner@linux.ibm.com> > > gcc/ > > * config/rs6000/mma.md (mma_<acc>): New define_expand to handle > mma_<acc> for dense math and non dense math. > (mma_<acc> insn): Restrict to non dense math. > (mma_xxsetaccz): Convert to define_expand to handle non dense math and > dense math. > (mma_xxsetaccz_vsx): Rename from mma_xxsetaccz and restrict usage to non > dense math. > (mma_xxsetaccz_dm): Dense math version of mma_xxsetaccz. > (mma_<vv>): Add support for dense math. > (mma_<avv>): Likewise. > (mma_<pv>): Likewise. > (mma_<apv>): Likewise. > (mma_<vvi4i4i8>): Likewise. > (mma_<avvi4i4i8>): Likewise. > (mma_<vvi4i4i2>): Likewise. > (mma_<avvi4i4i2>): Likewise. > (mma_<vvi4i4>): Likewise. > (mma_<avvi4i4>): Likewise. > (mma_<pvi4i2>): Likewise. > (mma_<apvi4i2>): Likewise. > (mma_<vvi4i4i4>): Likewise. > (mma_<avvi4i4i4>): Likewise. > * config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Define > __PPC_DMR__ if we have dense math instructions. > * config/rs6000/rs6000.cc (print_operand): Make %A handle only DMRs if > dense math and only FPRs if not dense math. > (rs6000_split_multireg_move): Do not generate the xxmtacc instruction to > prime the DMR registers or the xxmfacc instruction to de-prime > instructions if we have dense math register support. > --- > gcc/config/rs6000/mma.md | 247 +++++++++++++++++++++------------- > gcc/config/rs6000/rs6000-c.cc | 3 + > gcc/config/rs6000/rs6000.cc | 35 ++--- > 3 files changed, 176 insertions(+), 109 deletions(-) > > diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md > index bb898919ab5..525a85146ff 100644 > --- a/gcc/config/rs6000/mma.md > +++ b/gcc/config/rs6000/mma.md > @@ -559,190 +559,249 @@ (define_insn "*mma_disassemble_acc_dm" > "dmxxextfdmr256 %0,%1,2" > [(set_attr "type" "mma")]) > > -(define_insn "mma_<acc>" > +;; MMA instructions that do not use their accumulators as an input, still must > +;; not allow their vector operands to overlap the registers used by the > +;; accumulator. We enforce this by marking the output as early clobber. If we > +;; have dense math, we don't need the whole prime/de-prime action, so just make > +;; thse instructions be NOPs. typo: thse. > + > +(define_expand "mma_<acc>" > + [(set (match_operand:XO 0 "register_operand") > + (unspec:XO [(match_operand:XO 1 "register_operand")] s/register_operand/accumulator_operand/? > + MMA_ACC))] > + "TARGET_MMA" > +{ > + if (TARGET_DENSE_MATH) > + { > + if (!rtx_equal_p (operands[0], operands[1])) > + emit_move_insn (operands[0], operands[1]); > + DONE; > + } > + > + /* Generate the prime/de-prime code. */ > +}) > + > +(define_insn "*mma_<acc>" May be better to name with "*mma_<acc>_nodm"? > [(set (match_operand:XO 0 "fpr_reg_operand" "=&d") > (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0")] > MMA_ACC))] > - "TARGET_MMA" > + "TARGET_MMA && !TARGET_DENSE_MATH" I found that "TARGET_MMA && !TARGET_DENSE_MATH" is used much (like changes in function rs6000_split_multireg_move in this patch and some places in previous patches), maybe we can introduce a macro named as TARGET_MMA_NODM short for it? > "<acc> %A0" > [(set_attr "type" "mma")]) > > ;; We can't have integer constants in XOmode so we wrap this in an > -;; UNSPEC_VOLATILE. > +;; UNSPEC_VOLATILE for the non-dense math case. For dense math, we don't need > +;; to disable optimization and we can do a normal UNSPEC. > > -(define_insn "mma_xxsetaccz" > - [(set (match_operand:XO 0 "fpr_reg_operand" "=d") > +(define_expand "mma_xxsetaccz" > + [(set (match_operand:XO 0 "register_operand") s/register_operand/accumulator_operand/? > (unspec_volatile:XO [(const_int 0)] > UNSPECV_MMA_XXSETACCZ))] > "TARGET_MMA" > +{ > + if (TARGET_DENSE_MATH) > + { > + emit_insn (gen_mma_xxsetaccz_dm (operands[0])); > + DONE; > + } > +}) > + > +(define_insn "*mma_xxsetaccz_vsx" s/vsx/nodm/ > + [(set (match_operand:XO 0 "fpr_reg_operand" "=d") > + (unspec_volatile:XO [(const_int 0)] > + UNSPECV_MMA_XXSETACCZ))] > + "TARGET_MMA && !TARGET_DENSE_MATH" > "xxsetaccz %A0" > [(set_attr "type" "mma")]) > > + > +(define_insn "mma_xxsetaccz_dm" > + [(set (match_operand:XO 0 "dmr_operand" "=wD") > + (unspec:XO [(const_int 0)] > + UNSPECV_MMA_XXSETACCZ))] > + "TARGET_DENSE_MATH" > + "dmsetdmrz %0" > + [(set_attr "type" "mma")]) > + > (define_insn "mma_<vv>" > - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") > - (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa") > - (match_operand:V16QI 2 "vsx_register_operand" "v,?wa")] > + [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d") > + (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "wa,v,?wa") > + (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa")] > MMA_VV))] > "TARGET_MMA" > "<vv> %A0,%x1,%x2" > - [(set_attr "type" "mma")]) > + [(set_attr "type" "mma") > + (set_attr "isa" "dm,not_dm,not_dm")]) Like what's suggested in previous patches, s/not_dm/nodm/ The others look good to me, thanks! BR, Kewen > > (define_insn "mma_<avv>" > - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") > - (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0") > - (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") > - (match_operand:V16QI 3 "vsx_register_operand" "v,?wa")] > + [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d") > + (unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0,0") > + (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa") > + (match_operand:V16QI 3 "vsx_register_operand" "wa,v,?wa")] > MMA_AVV))] > "TARGET_MMA" > "<avv> %A0,%x2,%x3" > - [(set_attr "type" "mma")]) > + [(set_attr "type" "mma") > + (set_attr "isa" "dm,not_dm,not_dm")]) > > (define_insn "mma_<pv>" > - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") > - (unspec:XO [(match_operand:OO 1 "vsx_register_operand" "v,?wa") > - (match_operand:V16QI 2 "vsx_register_operand" "v,?wa")] > + [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d") > + (unspec:XO [(match_operand:OO 1 "vsx_register_operand" "wa,v,?wa") > + (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa")] > MMA_PV))] > "TARGET_MMA" > "<pv> %A0,%x1,%x2" > - [(set_attr "type" "mma")]) > + [(set_attr "type" "mma") > + (set_attr "isa" "dm,not_dm,not_dm")]) > > (define_insn "mma_<apv>" > - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") > - (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0") > - (match_operand:OO 2 "vsx_register_operand" "v,?wa") > - (match_operand:V16QI 3 "vsx_register_operand" "v,?wa")] > + [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d") > + (unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0,0") > + (match_operand:OO 2 "vsx_register_operand" "wa,v,?wa") > + (match_operand:V16QI 3 "vsx_register_operand" "wa,v,?wa")] > MMA_APV))] > "TARGET_MMA" > "<apv> %A0,%x2,%x3" > - [(set_attr "type" "mma")]) > + [(set_attr "type" "mma") > + (set_attr "isa" "dm,not_dm,not_dm")]) > > (define_insn "mma_<vvi4i4i8>" > - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") > - (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa") > - (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") > - (match_operand:SI 3 "const_0_to_15_operand" "n,n") > - (match_operand:SI 4 "const_0_to_15_operand" "n,n") > - (match_operand:SI 5 "u8bit_cint_operand" "n,n")] > + [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d") > + (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "wa,v,?wa") > + (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa") > + (match_operand:SI 3 "const_0_to_15_operand" "n,n,n") > + (match_operand:SI 4 "const_0_to_15_operand" "n,n,n") > + (match_operand:SI 5 "u8bit_cint_operand" "n,n,n")] > MMA_VVI4I4I8))] > "TARGET_MMA" > "<vvi4i4i8> %A0,%x1,%x2,%3,%4,%5" > [(set_attr "type" "mma") > - (set_attr "prefixed" "yes")]) > + (set_attr "prefixed" "yes") > + (set_attr "isa" "dm,not_dm,not_dm")]) > > (define_insn "mma_<avvi4i4i8>" > - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") > - (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0") > - (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") > - (match_operand:V16QI 3 "vsx_register_operand" "v,?wa") > - (match_operand:SI 4 "const_0_to_15_operand" "n,n") > - (match_operand:SI 5 "const_0_to_15_operand" "n,n") > - (match_operand:SI 6 "u8bit_cint_operand" "n,n")] > + [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d") > + (unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0,0") > + (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa") > + (match_operand:V16QI 3 "vsx_register_operand" "wa,v,?wa") > + (match_operand:SI 4 "const_0_to_15_operand" "n,n,n") > + (match_operand:SI 5 "const_0_to_15_operand" "n,n,n") > + (match_operand:SI 6 "u8bit_cint_operand" "n,n,n")] > MMA_AVVI4I4I8))] > "TARGET_MMA" > "<avvi4i4i8> %A0,%x2,%x3,%4,%5,%6" > [(set_attr "type" "mma") > - (set_attr "prefixed" "yes")]) > + (set_attr "prefixed" "yes") > + (set_attr "isa" "dm,not_dm,not_dm")]) > > (define_insn "mma_<vvi4i4i2>" > - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") > - (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa") > - (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") > - (match_operand:SI 3 "const_0_to_15_operand" "n,n") > - (match_operand:SI 4 "const_0_to_15_operand" "n,n") > - (match_operand:SI 5 "const_0_to_3_operand" "n,n")] > + [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d") > + (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "wa,v,?wa") > + (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa") > + (match_operand:SI 3 "const_0_to_15_operand" "n,n,n") > + (match_operand:SI 4 "const_0_to_15_operand" "n,n,n") > + (match_operand:SI 5 "const_0_to_3_operand" "n,n,n")] > MMA_VVI4I4I2))] > "TARGET_MMA" > "<vvi4i4i2> %A0,%x1,%x2,%3,%4,%5" > [(set_attr "type" "mma") > - (set_attr "prefixed" "yes")]) > + (set_attr "prefixed" "yes") > + (set_attr "isa" "dm,not_dm,not_dm")]) > > (define_insn "mma_<avvi4i4i2>" > - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") > - (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0") > - (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") > - (match_operand:V16QI 3 "vsx_register_operand" "v,?wa") > - (match_operand:SI 4 "const_0_to_15_operand" "n,n") > - (match_operand:SI 5 "const_0_to_15_operand" "n,n") > - (match_operand:SI 6 "const_0_to_3_operand" "n,n")] > + [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d") > + (unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0,0") > + (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa") > + (match_operand:V16QI 3 "vsx_register_operand" "wa,v,?wa") > + (match_operand:SI 4 "const_0_to_15_operand" "n,n,n") > + (match_operand:SI 5 "const_0_to_15_operand" "n,n,n") > + (match_operand:SI 6 "const_0_to_3_operand" "n,n,n")] > MMA_AVVI4I4I2))] > "TARGET_MMA" > "<avvi4i4i2> %A0,%x2,%x3,%4,%5,%6" > [(set_attr "type" "mma") > - (set_attr "prefixed" "yes")]) > + (set_attr "prefixed" "yes") > + (set_attr "isa" "dm,not_dm,not_dm")]) > > (define_insn "mma_<vvi4i4>" > - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") > - (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa") > - (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") > - (match_operand:SI 3 "const_0_to_15_operand" "n,n") > - (match_operand:SI 4 "const_0_to_15_operand" "n,n")] > + [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d") > + (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "wa,v,?wa") > + (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa") > + (match_operand:SI 3 "const_0_to_15_operand" "n,n,n") > + (match_operand:SI 4 "const_0_to_15_operand" "n,n,n")] > MMA_VVI4I4))] > "TARGET_MMA" > "<vvi4i4> %A0,%x1,%x2,%3,%4" > [(set_attr "type" "mma") > - (set_attr "prefixed" "yes")]) > + (set_attr "prefixed" "yes") > + (set_attr "isa" "dm,not_dm,not_dm")]) > > (define_insn "mma_<avvi4i4>" > - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") > - (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0") > - (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") > - (match_operand:V16QI 3 "vsx_register_operand" "v,?wa") > - (match_operand:SI 4 "const_0_to_15_operand" "n,n") > - (match_operand:SI 5 "const_0_to_15_operand" "n,n")] > + [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d") > + (unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0,0") > + (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa") > + (match_operand:V16QI 3 "vsx_register_operand" "wa,v,?wa") > + (match_operand:SI 4 "const_0_to_15_operand" "n,n,n") > + (match_operand:SI 5 "const_0_to_15_operand" "n,n,n")] > MMA_AVVI4I4))] > "TARGET_MMA" > "<avvi4i4> %A0,%x2,%x3,%4,%5" > [(set_attr "type" "mma") > - (set_attr "prefixed" "yes")]) > + (set_attr "prefixed" "yes") > + (set_attr "isa" "dm,not_dm,not_dm")]) > > (define_insn "mma_<pvi4i2>" > - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") > - (unspec:XO [(match_operand:OO 1 "vsx_register_operand" "v,?wa") > - (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") > - (match_operand:SI 3 "const_0_to_15_operand" "n,n") > - (match_operand:SI 4 "const_0_to_3_operand" "n,n")] > + [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d") > + (unspec:XO [(match_operand:OO 1 "vsx_register_operand" "wa,v,?wa") > + (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa") > + (match_operand:SI 3 "const_0_to_15_operand" "n,n,n") > + (match_operand:SI 4 "const_0_to_3_operand" "n,n,n")] > MMA_PVI4I2))] > "TARGET_MMA" > "<pvi4i2> %A0,%x1,%x2,%3,%4" > [(set_attr "type" "mma") > - (set_attr "prefixed" "yes")]) > + (set_attr "prefixed" "yes") > + (set_attr "isa" "dm,not_dm,not_dm")]) > > (define_insn "mma_<apvi4i2>" > - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") > - (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0") > - (match_operand:OO 2 "vsx_register_operand" "v,?wa") > - (match_operand:V16QI 3 "vsx_register_operand" "v,?wa") > - (match_operand:SI 4 "const_0_to_15_operand" "n,n") > - (match_operand:SI 5 "const_0_to_3_operand" "n,n")] > + [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d") > + (unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0,0") > + (match_operand:OO 2 "vsx_register_operand" "wa,v,?wa") > + (match_operand:V16QI 3 "vsx_register_operand" "wa,v,?wa") > + (match_operand:SI 4 "const_0_to_15_operand" "n,n,n") > + (match_operand:SI 5 "const_0_to_3_operand" "n,n,n")] > MMA_APVI4I2))] > "TARGET_MMA" > "<apvi4i2> %A0,%x2,%x3,%4,%5" > [(set_attr "type" "mma") > - (set_attr "prefixed" "yes")]) > + (set_attr "prefixed" "yes") > + (set_attr "isa" "dm,not_dm,not_dm")]) > > (define_insn "mma_<vvi4i4i4>" > - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") > - (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa") > - (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") > - (match_operand:SI 3 "const_0_to_15_operand" "n,n") > - (match_operand:SI 4 "const_0_to_15_operand" "n,n") > - (match_operand:SI 5 "const_0_to_15_operand" "n,n")] > + [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d") > + (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "wa,v,?wa") > + (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa") > + (match_operand:SI 3 "const_0_to_15_operand" "n,n,n") > + (match_operand:SI 4 "const_0_to_15_operand" "n,n,n") > + (match_operand:SI 5 "const_0_to_15_operand" "n,n,n")] > MMA_VVI4I4I4))] > "TARGET_MMA" > "<vvi4i4i4> %A0,%x1,%x2,%3,%4,%5" > [(set_attr "type" "mma") > - (set_attr "prefixed" "yes")]) > + (set_attr "prefixed" "yes") > + (set_attr "isa" "dm,not_dm,not_dm")]) > > (define_insn "mma_<avvi4i4i4>" > - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") > - (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0") > - (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") > - (match_operand:V16QI 3 "vsx_register_operand" "v,?wa") > - (match_operand:SI 4 "const_0_to_15_operand" "n,n") > - (match_operand:SI 5 "const_0_to_15_operand" "n,n") > - (match_operand:SI 6 "const_0_to_15_operand" "n,n")] > + [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d") > + (unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0,0") > + (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa") > + (match_operand:V16QI 3 "vsx_register_operand" "wa,v,?wa") > + (match_operand:SI 4 "const_0_to_15_operand" "n,n,n") > + (match_operand:SI 5 "const_0_to_15_operand" "n,n,n") > + (match_operand:SI 6 "const_0_to_15_operand" "n,n,n")] > MMA_AVVI4I4I4))] > "TARGET_MMA" > "<avvi4i4i4> %A0,%x2,%x3,%4,%5,%6" > [(set_attr "type" "mma") > - (set_attr "prefixed" "yes")]) > + (set_attr "prefixed" "yes") > + (set_attr "isa" "dm,not_dm,not_dm")]) > diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc > index f2fb5bef678..4342620f87f 100644 > --- a/gcc/config/rs6000/rs6000-c.cc > +++ b/gcc/config/rs6000/rs6000-c.cc > @@ -600,6 +600,9 @@ rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT flags) > /* Tell the user if we support the MMA instructions. */ > if ((flags & OPTION_MASK_MMA) != 0) > rs6000_define_or_undefine_macro (define_p, "__MMA__"); > + /* Tell the user if we support the dense math instructions. */ > + if ((flags & OPTION_MASK_DENSE_MATH) != 0) > + rs6000_define_or_undefine_macro (define_p, "__PPC_DMR__"); > /* Whether pc-relative code is being generated. */ > if ((flags & OPTION_MASK_PCREL) != 0) > rs6000_define_or_undefine_macro (define_p, "__PCREL__"); > diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc > index 83e32f7a43a..59517c8608d 100644 > --- a/gcc/config/rs6000/rs6000.cc > +++ b/gcc/config/rs6000/rs6000.cc > @@ -14264,8 +14264,13 @@ print_operand (FILE *file, rtx x, int code) > overlapping with the FPR registers. */ > if (!REG_P (x)) > output_operand_lossage ("invalid %%A value"); > - else if (TARGET_DENSE_MATH && DMR_REGNO_P (REGNO (x))) > - fprintf (file, "%d", REGNO (x) - FIRST_DMR_REGNO); > + else if (TARGET_DENSE_MATH) > + { > + if (DMR_REGNO_P (REGNO (x))) > + fprintf (file, "%d", REGNO (x) - FIRST_DMR_REGNO); > + else > + output_operand_lossage ("%%A operand is not a DMR"); > + } > else if (!FP_REGNO_P (REGNO (x)) || (REGNO (x) % 4) != 0) > output_operand_lossage ("invalid %%A value"); > else > @@ -27719,7 +27724,7 @@ rs6000_split_multireg_move (rtx dst, rtx src) > > /* If we are reading an accumulator register, we have to > deprime it before we can access it. */ > - if (TARGET_MMA > + if (TARGET_MMA && !TARGET_DENSE_MATH > && GET_MODE (src) == XOmode && FP_REGNO_P (REGNO (src))) > emit_insn (gen_mma_xxmfacc (src, src)); > > @@ -27751,9 +27756,9 @@ rs6000_split_multireg_move (rtx dst, rtx src) > emit_insn (gen_rtx_SET (dst2, src2)); > } > > - /* If we are writing an accumulator register, we have to > - prime it after we've written it. */ > - if (TARGET_MMA > + /* If we are writing an accumulator register that overlaps with the > + FPR registers, we have to prime it after we've written it. */ > + if (TARGET_MMA && !TARGET_DENSE_MATH > && GET_MODE (dst) == XOmode && FP_REGNO_P (REGNO (dst))) > emit_insn (gen_mma_xxmtacc (dst, dst)); > > @@ -27822,9 +27827,9 @@ rs6000_split_multireg_move (rtx dst, rtx src) > emit_insn (gen_rtx_SET (dst_i, op)); > } > > - /* We are writing an accumulator register, so we have to > - prime it after we've written it. */ > - if (GET_MODE (src) == XOmode) > + /* On systems without dense math where accumulators overlap with the > + vector registers, we have to prime it after we've written it. */ > + if (GET_MODE (src) == XOmode && !TARGET_DENSE_MATH) > emit_insn (gen_mma_xxmtacc (dst, dst)); > > return; > @@ -27835,9 +27840,9 @@ rs6000_split_multireg_move (rtx dst, rtx src) > > if (REG_P (src) && REG_P (dst) && (REGNO (src) < REGNO (dst))) > { > - /* If we are reading an accumulator register, we have to > - deprime it before we can access it. */ > - if (TARGET_MMA > + /* If we are reading an accumulator register and we don't have dense > + math, we have to deprime it before we can access it. */ > + if (TARGET_MMA && !TARGET_DENSE_MATH > && GET_MODE (src) == XOmode && FP_REGNO_P (REGNO (src))) > emit_insn (gen_mma_xxmfacc (src, src)); > > @@ -27865,7 +27870,7 @@ rs6000_split_multireg_move (rtx dst, rtx src) > > /* If we are writing an accumulator register, we have to > prime it after we've written it. */ > - if (TARGET_MMA > + if (TARGET_MMA && !TARGET_DENSE_MATH > && GET_MODE (dst) == XOmode && FP_REGNO_P (REGNO (dst))) > emit_insn (gen_mma_xxmtacc (dst, dst)); > } > @@ -28002,7 +28007,7 @@ rs6000_split_multireg_move (rtx dst, rtx src) > > /* If we are reading an accumulator register, we have to > deprime it before we can access it. */ > - if (TARGET_MMA && REG_P (src) > + if (TARGET_MMA && !TARGET_DENSE_MATH && REG_P (src) > && GET_MODE (src) == XOmode && FP_REGNO_P (REGNO (src))) > emit_insn (gen_mma_xxmfacc (src, src)); > > @@ -28034,7 +28039,7 @@ rs6000_split_multireg_move (rtx dst, rtx src) > > /* If we are writing an accumulator register, we have to > prime it after we've written it. */ > - if (TARGET_MMA && REG_P (dst) > + if (TARGET_MMA && !TARGET_DENSE_MATH && REG_P (dst) > && GET_MODE (dst) == XOmode && FP_REGNO_P (REGNO (dst))) > emit_insn (gen_mma_xxmtacc (dst, dst)); > ≈
On Sun, Feb 04, 2024 at 11:21:49AM +0800, Kewen.Lin wrote: > Hi Mike, > > > --- a/gcc/config/rs6000/mma.md > > +++ b/gcc/config/rs6000/mma.md > > @@ -559,190 +559,249 @@ (define_insn "*mma_disassemble_acc_dm" > > "dmxxextfdmr256 %0,%1,2" > > [(set_attr "type" "mma")]) > > > > -(define_insn "mma_<acc>" > > +;; MMA instructions that do not use their accumulators as an input, still must > > +;; not allow their vector operands to overlap the registers used by the > > +;; accumulator. We enforce this by marking the output as early clobber. If we > > +;; have dense math, we don't need the whole prime/de-prime action, so just make > > +;; thse instructions be NOPs. > > typo: thse. Ok. > > + > > +(define_expand "mma_<acc>" > > + [(set (match_operand:XO 0 "register_operand") > > + (unspec:XO [(match_operand:XO 1 "register_operand")] > > s/register_operand/accumulator_operand/? Ok. > > + MMA_ACC))] > > + "TARGET_MMA" > > +{ > > + if (TARGET_DENSE_MATH) > > + { > > + if (!rtx_equal_p (operands[0], operands[1])) > > + emit_move_insn (operands[0], operands[1]); > > + DONE; > > + } > > + > > + /* Generate the prime/de-prime code. */ > > +}) > > + > > +(define_insn "*mma_<acc>" > > May be better to name with "*mma_<acc>_nodm"? Ok. > > [(set (match_operand:XO 0 "fpr_reg_operand" "=&d") > > (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0")] > > MMA_ACC))] > > - "TARGET_MMA" > > + "TARGET_MMA && !TARGET_DENSE_MATH" > > I found that "TARGET_MMA && !TARGET_DENSE_MATH" is used much (like changes in function > rs6000_split_multireg_move in this patch and some places in previous patches), maybe we > can introduce a macro named as TARGET_MMA_NODM short for it? As I said in the message about the last patch, I added TARGET_MMA_NO_DENSE_MATH. > > "<acc> %A0" > > [(set_attr "type" "mma")]) > > > > ;; We can't have integer constants in XOmode so we wrap this in an > > -;; UNSPEC_VOLATILE. > > +;; UNSPEC_VOLATILE for the non-dense math case. For dense math, we don't need > > +;; to disable optimization and we can do a normal UNSPEC. > > > > -(define_insn "mma_xxsetaccz" > > - [(set (match_operand:XO 0 "fpr_reg_operand" "=d") > > +(define_expand "mma_xxsetaccz" > > + [(set (match_operand:XO 0 "register_operand") > > s/register_operand/accumulator_operand/? Ok. > > (unspec_volatile:XO [(const_int 0)] > > UNSPECV_MMA_XXSETACCZ))] > > "TARGET_MMA" > > +{ > > + if (TARGET_DENSE_MATH) > > + { > > + emit_insn (gen_mma_xxsetaccz_dm (operands[0])); > > + DONE; > > + } > > +}) > > + > > +(define_insn "*mma_xxsetaccz_vsx" > > s/vsx/nodm/ Ok. > > + [(set (match_operand:XO 0 "fpr_reg_operand" "=d") > > + (unspec_volatile:XO [(const_int 0)] > > + UNSPECV_MMA_XXSETACCZ))] > > + "TARGET_MMA && !TARGET_DENSE_MATH" > > "xxsetaccz %A0" > > [(set_attr "type" "mma")]) > > > > + > > +(define_insn "mma_xxsetaccz_dm" > > + [(set (match_operand:XO 0 "dmr_operand" "=wD") > > + (unspec:XO [(const_int 0)] > > + UNSPECV_MMA_XXSETACCZ))] > > + "TARGET_DENSE_MATH" > > + "dmsetdmrz %0" > > + [(set_attr "type" "mma")]) > > + > > (define_insn "mma_<vv>" > > - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") > > - (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa") > > - (match_operand:V16QI 2 "vsx_register_operand" "v,?wa")] > > + [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d") > > + (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "wa,v,?wa") > > + (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa")] > > MMA_VV))] > > "TARGET_MMA" > > "<vv> %A0,%x1,%x2" > > - [(set_attr "type" "mma")]) > > + [(set_attr "type" "mma") > > + (set_attr "isa" "dm,not_dm,not_dm")]) > > Like what's suggested in previous patches, s/not_dm/nodm/ Ok.
diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md index bb898919ab5..525a85146ff 100644 --- a/gcc/config/rs6000/mma.md +++ b/gcc/config/rs6000/mma.md @@ -559,190 +559,249 @@ (define_insn "*mma_disassemble_acc_dm" "dmxxextfdmr256 %0,%1,2" [(set_attr "type" "mma")]) -(define_insn "mma_<acc>" +;; MMA instructions that do not use their accumulators as an input, still must +;; not allow their vector operands to overlap the registers used by the +;; accumulator. We enforce this by marking the output as early clobber. If we +;; have dense math, we don't need the whole prime/de-prime action, so just make +;; thse instructions be NOPs. + +(define_expand "mma_<acc>" + [(set (match_operand:XO 0 "register_operand") + (unspec:XO [(match_operand:XO 1 "register_operand")] + MMA_ACC))] + "TARGET_MMA" +{ + if (TARGET_DENSE_MATH) + { + if (!rtx_equal_p (operands[0], operands[1])) + emit_move_insn (operands[0], operands[1]); + DONE; + } + + /* Generate the prime/de-prime code. */ +}) + +(define_insn "*mma_<acc>" [(set (match_operand:XO 0 "fpr_reg_operand" "=&d") (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0")] MMA_ACC))] - "TARGET_MMA" + "TARGET_MMA && !TARGET_DENSE_MATH" "<acc> %A0" [(set_attr "type" "mma")]) ;; We can't have integer constants in XOmode so we wrap this in an -;; UNSPEC_VOLATILE. +;; UNSPEC_VOLATILE for the non-dense math case. For dense math, we don't need +;; to disable optimization and we can do a normal UNSPEC. -(define_insn "mma_xxsetaccz" - [(set (match_operand:XO 0 "fpr_reg_operand" "=d") +(define_expand "mma_xxsetaccz" + [(set (match_operand:XO 0 "register_operand") (unspec_volatile:XO [(const_int 0)] UNSPECV_MMA_XXSETACCZ))] "TARGET_MMA" +{ + if (TARGET_DENSE_MATH) + { + emit_insn (gen_mma_xxsetaccz_dm (operands[0])); + DONE; + } +}) + +(define_insn "*mma_xxsetaccz_vsx" + [(set (match_operand:XO 0 "fpr_reg_operand" "=d") + (unspec_volatile:XO [(const_int 0)] + UNSPECV_MMA_XXSETACCZ))] + "TARGET_MMA && !TARGET_DENSE_MATH" "xxsetaccz %A0" [(set_attr "type" "mma")]) + +(define_insn "mma_xxsetaccz_dm" + [(set (match_operand:XO 0 "dmr_operand" "=wD") + (unspec:XO [(const_int 0)] + UNSPECV_MMA_XXSETACCZ))] + "TARGET_DENSE_MATH" + "dmsetdmrz %0" + [(set_attr "type" "mma")]) + (define_insn "mma_<vv>" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") - (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa") - (match_operand:V16QI 2 "vsx_register_operand" "v,?wa")] + [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d") + (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "wa,v,?wa") + (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa")] MMA_VV))] "TARGET_MMA" "<vv> %A0,%x1,%x2" - [(set_attr "type" "mma")]) + [(set_attr "type" "mma") + (set_attr "isa" "dm,not_dm,not_dm")]) (define_insn "mma_<avv>" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") - (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0") - (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") - (match_operand:V16QI 3 "vsx_register_operand" "v,?wa")] + [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d") + (unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0,0") + (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa") + (match_operand:V16QI 3 "vsx_register_operand" "wa,v,?wa")] MMA_AVV))] "TARGET_MMA" "<avv> %A0,%x2,%x3" - [(set_attr "type" "mma")]) + [(set_attr "type" "mma") + (set_attr "isa" "dm,not_dm,not_dm")]) (define_insn "mma_<pv>" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") - (unspec:XO [(match_operand:OO 1 "vsx_register_operand" "v,?wa") - (match_operand:V16QI 2 "vsx_register_operand" "v,?wa")] + [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d") + (unspec:XO [(match_operand:OO 1 "vsx_register_operand" "wa,v,?wa") + (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa")] MMA_PV))] "TARGET_MMA" "<pv> %A0,%x1,%x2" - [(set_attr "type" "mma")]) + [(set_attr "type" "mma") + (set_attr "isa" "dm,not_dm,not_dm")]) (define_insn "mma_<apv>" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") - (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0") - (match_operand:OO 2 "vsx_register_operand" "v,?wa") - (match_operand:V16QI 3 "vsx_register_operand" "v,?wa")] + [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d") + (unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0,0") + (match_operand:OO 2 "vsx_register_operand" "wa,v,?wa") + (match_operand:V16QI 3 "vsx_register_operand" "wa,v,?wa")] MMA_APV))] "TARGET_MMA" "<apv> %A0,%x2,%x3" - [(set_attr "type" "mma")]) + [(set_attr "type" "mma") + (set_attr "isa" "dm,not_dm,not_dm")]) (define_insn "mma_<vvi4i4i8>" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") - (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa") - (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") - (match_operand:SI 3 "const_0_to_15_operand" "n,n") - (match_operand:SI 4 "const_0_to_15_operand" "n,n") - (match_operand:SI 5 "u8bit_cint_operand" "n,n")] + [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d") + (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "wa,v,?wa") + (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa") + (match_operand:SI 3 "const_0_to_15_operand" "n,n,n") + (match_operand:SI 4 "const_0_to_15_operand" "n,n,n") + (match_operand:SI 5 "u8bit_cint_operand" "n,n,n")] MMA_VVI4I4I8))] "TARGET_MMA" "<vvi4i4i8> %A0,%x1,%x2,%3,%4,%5" [(set_attr "type" "mma") - (set_attr "prefixed" "yes")]) + (set_attr "prefixed" "yes") + (set_attr "isa" "dm,not_dm,not_dm")]) (define_insn "mma_<avvi4i4i8>" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") - (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0") - (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") - (match_operand:V16QI 3 "vsx_register_operand" "v,?wa") - (match_operand:SI 4 "const_0_to_15_operand" "n,n") - (match_operand:SI 5 "const_0_to_15_operand" "n,n") - (match_operand:SI 6 "u8bit_cint_operand" "n,n")] + [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d") + (unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0,0") + (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa") + (match_operand:V16QI 3 "vsx_register_operand" "wa,v,?wa") + (match_operand:SI 4 "const_0_to_15_operand" "n,n,n") + (match_operand:SI 5 "const_0_to_15_operand" "n,n,n") + (match_operand:SI 6 "u8bit_cint_operand" "n,n,n")] MMA_AVVI4I4I8))] "TARGET_MMA" "<avvi4i4i8> %A0,%x2,%x3,%4,%5,%6" [(set_attr "type" "mma") - (set_attr "prefixed" "yes")]) + (set_attr "prefixed" "yes") + (set_attr "isa" "dm,not_dm,not_dm")]) (define_insn "mma_<vvi4i4i2>" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") - (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa") - (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") - (match_operand:SI 3 "const_0_to_15_operand" "n,n") - (match_operand:SI 4 "const_0_to_15_operand" "n,n") - (match_operand:SI 5 "const_0_to_3_operand" "n,n")] + [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d") + (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "wa,v,?wa") + (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa") + (match_operand:SI 3 "const_0_to_15_operand" "n,n,n") + (match_operand:SI 4 "const_0_to_15_operand" "n,n,n") + (match_operand:SI 5 "const_0_to_3_operand" "n,n,n")] MMA_VVI4I4I2))] "TARGET_MMA" "<vvi4i4i2> %A0,%x1,%x2,%3,%4,%5" [(set_attr "type" "mma") - (set_attr "prefixed" "yes")]) + (set_attr "prefixed" "yes") + (set_attr "isa" "dm,not_dm,not_dm")]) (define_insn "mma_<avvi4i4i2>" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") - (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0") - (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") - (match_operand:V16QI 3 "vsx_register_operand" "v,?wa") - (match_operand:SI 4 "const_0_to_15_operand" "n,n") - (match_operand:SI 5 "const_0_to_15_operand" "n,n") - (match_operand:SI 6 "const_0_to_3_operand" "n,n")] + [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d") + (unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0,0") + (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa") + (match_operand:V16QI 3 "vsx_register_operand" "wa,v,?wa") + (match_operand:SI 4 "const_0_to_15_operand" "n,n,n") + (match_operand:SI 5 "const_0_to_15_operand" "n,n,n") + (match_operand:SI 6 "const_0_to_3_operand" "n,n,n")] MMA_AVVI4I4I2))] "TARGET_MMA" "<avvi4i4i2> %A0,%x2,%x3,%4,%5,%6" [(set_attr "type" "mma") - (set_attr "prefixed" "yes")]) + (set_attr "prefixed" "yes") + (set_attr "isa" "dm,not_dm,not_dm")]) (define_insn "mma_<vvi4i4>" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") - (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa") - (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") - (match_operand:SI 3 "const_0_to_15_operand" "n,n") - (match_operand:SI 4 "const_0_to_15_operand" "n,n")] + [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d") + (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "wa,v,?wa") + (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa") + (match_operand:SI 3 "const_0_to_15_operand" "n,n,n") + (match_operand:SI 4 "const_0_to_15_operand" "n,n,n")] MMA_VVI4I4))] "TARGET_MMA" "<vvi4i4> %A0,%x1,%x2,%3,%4" [(set_attr "type" "mma") - (set_attr "prefixed" "yes")]) + (set_attr "prefixed" "yes") + (set_attr "isa" "dm,not_dm,not_dm")]) (define_insn "mma_<avvi4i4>" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") - (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0") - (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") - (match_operand:V16QI 3 "vsx_register_operand" "v,?wa") - (match_operand:SI 4 "const_0_to_15_operand" "n,n") - (match_operand:SI 5 "const_0_to_15_operand" "n,n")] + [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d") + (unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0,0") + (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa") + (match_operand:V16QI 3 "vsx_register_operand" "wa,v,?wa") + (match_operand:SI 4 "const_0_to_15_operand" "n,n,n") + (match_operand:SI 5 "const_0_to_15_operand" "n,n,n")] MMA_AVVI4I4))] "TARGET_MMA" "<avvi4i4> %A0,%x2,%x3,%4,%5" [(set_attr "type" "mma") - (set_attr "prefixed" "yes")]) + (set_attr "prefixed" "yes") + (set_attr "isa" "dm,not_dm,not_dm")]) (define_insn "mma_<pvi4i2>" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") - (unspec:XO [(match_operand:OO 1 "vsx_register_operand" "v,?wa") - (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") - (match_operand:SI 3 "const_0_to_15_operand" "n,n") - (match_operand:SI 4 "const_0_to_3_operand" "n,n")] + [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d") + (unspec:XO [(match_operand:OO 1 "vsx_register_operand" "wa,v,?wa") + (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa") + (match_operand:SI 3 "const_0_to_15_operand" "n,n,n") + (match_operand:SI 4 "const_0_to_3_operand" "n,n,n")] MMA_PVI4I2))] "TARGET_MMA" "<pvi4i2> %A0,%x1,%x2,%3,%4" [(set_attr "type" "mma") - (set_attr "prefixed" "yes")]) + (set_attr "prefixed" "yes") + (set_attr "isa" "dm,not_dm,not_dm")]) (define_insn "mma_<apvi4i2>" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") - (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0") - (match_operand:OO 2 "vsx_register_operand" "v,?wa") - (match_operand:V16QI 3 "vsx_register_operand" "v,?wa") - (match_operand:SI 4 "const_0_to_15_operand" "n,n") - (match_operand:SI 5 "const_0_to_3_operand" "n,n")] + [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d") + (unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0,0") + (match_operand:OO 2 "vsx_register_operand" "wa,v,?wa") + (match_operand:V16QI 3 "vsx_register_operand" "wa,v,?wa") + (match_operand:SI 4 "const_0_to_15_operand" "n,n,n") + (match_operand:SI 5 "const_0_to_3_operand" "n,n,n")] MMA_APVI4I2))] "TARGET_MMA" "<apvi4i2> %A0,%x2,%x3,%4,%5" [(set_attr "type" "mma") - (set_attr "prefixed" "yes")]) + (set_attr "prefixed" "yes") + (set_attr "isa" "dm,not_dm,not_dm")]) (define_insn "mma_<vvi4i4i4>" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") - (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa") - (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") - (match_operand:SI 3 "const_0_to_15_operand" "n,n") - (match_operand:SI 4 "const_0_to_15_operand" "n,n") - (match_operand:SI 5 "const_0_to_15_operand" "n,n")] + [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d") + (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "wa,v,?wa") + (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa") + (match_operand:SI 3 "const_0_to_15_operand" "n,n,n") + (match_operand:SI 4 "const_0_to_15_operand" "n,n,n") + (match_operand:SI 5 "const_0_to_15_operand" "n,n,n")] MMA_VVI4I4I4))] "TARGET_MMA" "<vvi4i4i4> %A0,%x1,%x2,%3,%4,%5" [(set_attr "type" "mma") - (set_attr "prefixed" "yes")]) + (set_attr "prefixed" "yes") + (set_attr "isa" "dm,not_dm,not_dm")]) (define_insn "mma_<avvi4i4i4>" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") - (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0") - (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") - (match_operand:V16QI 3 "vsx_register_operand" "v,?wa") - (match_operand:SI 4 "const_0_to_15_operand" "n,n") - (match_operand:SI 5 "const_0_to_15_operand" "n,n") - (match_operand:SI 6 "const_0_to_15_operand" "n,n")] + [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d") + (unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0,0") + (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa") + (match_operand:V16QI 3 "vsx_register_operand" "wa,v,?wa") + (match_operand:SI 4 "const_0_to_15_operand" "n,n,n") + (match_operand:SI 5 "const_0_to_15_operand" "n,n,n") + (match_operand:SI 6 "const_0_to_15_operand" "n,n,n")] MMA_AVVI4I4I4))] "TARGET_MMA" "<avvi4i4i4> %A0,%x2,%x3,%4,%5,%6" [(set_attr "type" "mma") - (set_attr "prefixed" "yes")]) + (set_attr "prefixed" "yes") + (set_attr "isa" "dm,not_dm,not_dm")]) diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc index f2fb5bef678..4342620f87f 100644 --- a/gcc/config/rs6000/rs6000-c.cc +++ b/gcc/config/rs6000/rs6000-c.cc @@ -600,6 +600,9 @@ rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT flags) /* Tell the user if we support the MMA instructions. */ if ((flags & OPTION_MASK_MMA) != 0) rs6000_define_or_undefine_macro (define_p, "__MMA__"); + /* Tell the user if we support the dense math instructions. */ + if ((flags & OPTION_MASK_DENSE_MATH) != 0) + rs6000_define_or_undefine_macro (define_p, "__PPC_DMR__"); /* Whether pc-relative code is being generated. */ if ((flags & OPTION_MASK_PCREL) != 0) rs6000_define_or_undefine_macro (define_p, "__PCREL__"); diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index 83e32f7a43a..59517c8608d 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -14264,8 +14264,13 @@ print_operand (FILE *file, rtx x, int code) overlapping with the FPR registers. */ if (!REG_P (x)) output_operand_lossage ("invalid %%A value"); - else if (TARGET_DENSE_MATH && DMR_REGNO_P (REGNO (x))) - fprintf (file, "%d", REGNO (x) - FIRST_DMR_REGNO); + else if (TARGET_DENSE_MATH) + { + if (DMR_REGNO_P (REGNO (x))) + fprintf (file, "%d", REGNO (x) - FIRST_DMR_REGNO); + else + output_operand_lossage ("%%A operand is not a DMR"); + } else if (!FP_REGNO_P (REGNO (x)) || (REGNO (x) % 4) != 0) output_operand_lossage ("invalid %%A value"); else @@ -27719,7 +27724,7 @@ rs6000_split_multireg_move (rtx dst, rtx src) /* If we are reading an accumulator register, we have to deprime it before we can access it. */ - if (TARGET_MMA + if (TARGET_MMA && !TARGET_DENSE_MATH && GET_MODE (src) == XOmode && FP_REGNO_P (REGNO (src))) emit_insn (gen_mma_xxmfacc (src, src)); @@ -27751,9 +27756,9 @@ rs6000_split_multireg_move (rtx dst, rtx src) emit_insn (gen_rtx_SET (dst2, src2)); } - /* If we are writing an accumulator register, we have to - prime it after we've written it. */ - if (TARGET_MMA + /* If we are writing an accumulator register that overlaps with the + FPR registers, we have to prime it after we've written it. */ + if (TARGET_MMA && !TARGET_DENSE_MATH && GET_MODE (dst) == XOmode && FP_REGNO_P (REGNO (dst))) emit_insn (gen_mma_xxmtacc (dst, dst)); @@ -27822,9 +27827,9 @@ rs6000_split_multireg_move (rtx dst, rtx src) emit_insn (gen_rtx_SET (dst_i, op)); } - /* We are writing an accumulator register, so we have to - prime it after we've written it. */ - if (GET_MODE (src) == XOmode) + /* On systems without dense math where accumulators overlap with the + vector registers, we have to prime it after we've written it. */ + if (GET_MODE (src) == XOmode && !TARGET_DENSE_MATH) emit_insn (gen_mma_xxmtacc (dst, dst)); return; @@ -27835,9 +27840,9 @@ rs6000_split_multireg_move (rtx dst, rtx src) if (REG_P (src) && REG_P (dst) && (REGNO (src) < REGNO (dst))) { - /* If we are reading an accumulator register, we have to - deprime it before we can access it. */ - if (TARGET_MMA + /* If we are reading an accumulator register and we don't have dense + math, we have to deprime it before we can access it. */ + if (TARGET_MMA && !TARGET_DENSE_MATH && GET_MODE (src) == XOmode && FP_REGNO_P (REGNO (src))) emit_insn (gen_mma_xxmfacc (src, src)); @@ -27865,7 +27870,7 @@ rs6000_split_multireg_move (rtx dst, rtx src) /* If we are writing an accumulator register, we have to prime it after we've written it. */ - if (TARGET_MMA + if (TARGET_MMA && !TARGET_DENSE_MATH && GET_MODE (dst) == XOmode && FP_REGNO_P (REGNO (dst))) emit_insn (gen_mma_xxmtacc (dst, dst)); } @@ -28002,7 +28007,7 @@ rs6000_split_multireg_move (rtx dst, rtx src) /* If we are reading an accumulator register, we have to deprime it before we can access it. */ - if (TARGET_MMA && REG_P (src) + if (TARGET_MMA && !TARGET_DENSE_MATH && REG_P (src) && GET_MODE (src) == XOmode && FP_REGNO_P (REGNO (src))) emit_insn (gen_mma_xxmfacc (src, src)); @@ -28034,7 +28039,7 @@ rs6000_split_multireg_move (rtx dst, rtx src) /* If we are writing an accumulator register, we have to prime it after we've written it. */ - if (TARGET_MMA && REG_P (dst) + if (TARGET_MMA && !TARGET_DENSE_MATH && REG_P (dst) && GET_MODE (dst) == XOmode && FP_REGNO_P (REGNO (dst))) emit_insn (gen_mma_xxmtacc (dst, dst));