Message ID | 20240813124150.1168825-2-victor.donascimento@arm.com |
---|---|
State | New |
Headers | show |
Series | optabs: Make all `*dot_prod_optab's modeled as conversions | expand |
Victor Do Nascimento <victor.donascimento@arm.com> writes: > Given the specification in the GCC internals manual defines the > {u|s}dot_prod<m> standard name as taking "two signed elements of the > same mode, adding them to a third operand of wider mode", there is > currently ambiguity in the relationship between the mode of the first > two arguments and that of the third. > > This vagueness means that, in theory, different modes may be > supportable in the third argument. This flexibility would allow for a > given backend to add to the accumulator a different number of > vectorized products, e.g. A backend may provide instructions for both: > > accum += a[0] * b[0] + a[1] * b[1] + a[2] * b[2] + a[3] * b[3] > > and > > accum += a[0] * b[0] + a[1] * b[1], > > as is now seen in the SVE2.1 extension to AArch64. In spite of the > aforementioned flexibility, modeling the dot-product operation as a > direct optab means that we have no way to encode both input and the > accumulator data modes into the backend pattern name, which prevents > us from harnessing this flexibility. > > We therefore make all dot_prod optabs conversions, allowing, for > example, for the encoding of both 2-way and 4-way dot product backend > patterns. > > gcc/ChangeLog: > > * optabs.def (sdot_prod_optab): Convert from OPTAB_D to > OPTAB_CD. > (udot_prod_optab): Likewise. > (usdot_prod_optab): Likewise. > * doc/md.texi (Standard Names): update entries for u,s and us > dot_prod names. > --- > gcc/doc/md.texi | 46 +++++++++++++++++++++------------------------- > gcc/optabs.def | 6 +++--- > 2 files changed, 24 insertions(+), 28 deletions(-) > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi > index 5dc0d55edd6..aa1181a3320 100644 > --- a/gcc/doc/md.texi > +++ b/gcc/doc/md.texi > @@ -5760,15 +5760,14 @@ for (i = 0; i < LEN + BIAS; i++) > operand0 += operand2[i]; > @end smallexample > > -@cindex @code{sdot_prod@var{m}} instruction pattern > -@item @samp{sdot_prod@var{m}} > - > -Compute the sum of the products of two signed elements. > -Operand 1 and operand 2 are of the same mode. Their > -product, which is of a wider mode, is computed and added to operand 3. > -Operand 3 is of a mode equal or wider than the mode of the product. The > -result is placed in operand 0, which is of the same mode as operand 3. > -@var{m} is the mode of operand 1 and operand 2. > +@cindex @code{sdot_prod@var{m}@var{n}} instruction pattern > +@item @samp{sdot_prod@var{m}@var{n}} > + > +Multiply operand 1 by operand 2 without loss of precision, given that > +both operands contain signed elements. Add each product to the overlapping > +element of operand 3 and store the result in operand 0. Operands 0 and 3 > +have mode @var{m} and operands 1 and 2 have mode @var{n}, with @var{n} > +having narrower elements than @var{m}. > > Semantically the expressions perform the multiplication in the following signs > > @@ -5778,15 +5777,14 @@ sdot<signed op0, signed op1, signed op2, signed op3> == > @dots{} > @end smallexample > > -@cindex @code{udot_prod@var{m}} instruction pattern > -@item @samp{udot_prod@var{m}} > +@cindex @code{udot_prod@var{m}@var{n}} instruction pattern > +@item @samp{udot_prod@var{m}@var{n}} > > -Compute the sum of the products of two unsigned elements. > -Operand 1 and operand 2 are of the same mode. Their > -product, which is of a wider mode, is computed and added to operand 3. > -Operand 3 is of a mode equal or wider than the mode of the product. The > -result is placed in operand 0, which is of the same mode as operand 3. > -@var{m} is the mode of operand 1 and operand 2. > +Multiply operand 1 by operand 2 without loss of precision, given that > +both operands contain unsigned elements. Add each product to the overlapping > +element of operand 3 and store the result in operand 0. Operands 0 and 3 > +have mode @var{m} and operands 1 and 2 have mode @var{n}, with @var{n} > +having narrower elements than @var{m}. > > Semantically the expressions perform the multiplication in the following signs > > @@ -5796,14 +5794,12 @@ udot<unsigned op0, unsigned op1, unsigned op2, unsigned op3> == > @dots{} > @end smallexample > > -@cindex @code{usdot_prod@var{m}} instruction pattern > -@item @samp{usdot_prod@var{m}} > -Compute the sum of the products of elements of different signs. > -Operand 1 must be unsigned and operand 2 signed. Their > -product, which is of a wider mode, is computed and added to operand 3. > -Operand 3 is of a mode equal or wider than the mode of the product. The > -result is placed in operand 0, which is of the same mode as operand 3. > -@var{m} is the mode of operand 1 and operand 2. > +@cindex @code{usdot_prod@var{m}@var{n}} instruction pattern > +@item @samp{usdot_prod@var{m}@var{n}} > +Multiply operand 1 by operand 2. Add each product to the overlapping The new paragraph drops the information that operand 1 is unsigned and operand 2 is signed. Maybe change this sentence to: Multiply operand 1 by operand 2 without loss of precision, given that operand 1 is unsigned and operand 2 is signed. OK with that change, thanks. Richard > +element of operand 3 and store the result in operand 0. Operands 0 and 3 > +have mode @var{m} and operands 1 and 2 have mode @var{n}, with @var{n} > +having narrower elements than @var{m}. > > Semantically the expressions perform the multiplication in the following signs > > diff --git a/gcc/optabs.def b/gcc/optabs.def > index 58a939442bd..ba860144d8b 100644 > --- a/gcc/optabs.def > +++ b/gcc/optabs.def > @@ -110,6 +110,9 @@ OPTAB_CD(mask_scatter_store_optab, "mask_scatter_store$a$b") > OPTAB_CD(mask_len_scatter_store_optab, "mask_len_scatter_store$a$b") > OPTAB_CD(vec_extract_optab, "vec_extract$a$b") > OPTAB_CD(vec_init_optab, "vec_init$a$b") > +OPTAB_CD (sdot_prod_optab, "sdot_prod$I$a$b") > +OPTAB_CD (udot_prod_optab, "udot_prod$I$a$b") > +OPTAB_CD (usdot_prod_optab, "usdot_prod$I$a$b") > > OPTAB_CD (while_ult_optab, "while_ult$a$b") > > @@ -413,10 +416,7 @@ OPTAB_D (savg_floor_optab, "avg$a3_floor") > OPTAB_D (uavg_floor_optab, "uavg$a3_floor") > OPTAB_D (savg_ceil_optab, "avg$a3_ceil") > OPTAB_D (uavg_ceil_optab, "uavg$a3_ceil") > -OPTAB_D (sdot_prod_optab, "sdot_prod$I$a") > OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3") > -OPTAB_D (udot_prod_optab, "udot_prod$I$a") > -OPTAB_D (usdot_prod_optab, "usdot_prod$I$a") > OPTAB_D (usum_widen_optab, "widen_usum$I$a3") > OPTAB_D (usad_optab, "usad$I$a") > OPTAB_D (ssad_optab, "ssad$I$a")
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 5dc0d55edd6..aa1181a3320 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -5760,15 +5760,14 @@ for (i = 0; i < LEN + BIAS; i++) operand0 += operand2[i]; @end smallexample -@cindex @code{sdot_prod@var{m}} instruction pattern -@item @samp{sdot_prod@var{m}} - -Compute the sum of the products of two signed elements. -Operand 1 and operand 2 are of the same mode. Their -product, which is of a wider mode, is computed and added to operand 3. -Operand 3 is of a mode equal or wider than the mode of the product. The -result is placed in operand 0, which is of the same mode as operand 3. -@var{m} is the mode of operand 1 and operand 2. +@cindex @code{sdot_prod@var{m}@var{n}} instruction pattern +@item @samp{sdot_prod@var{m}@var{n}} + +Multiply operand 1 by operand 2 without loss of precision, given that +both operands contain signed elements. Add each product to the overlapping +element of operand 3 and store the result in operand 0. Operands 0 and 3 +have mode @var{m} and operands 1 and 2 have mode @var{n}, with @var{n} +having narrower elements than @var{m}. Semantically the expressions perform the multiplication in the following signs @@ -5778,15 +5777,14 @@ sdot<signed op0, signed op1, signed op2, signed op3> == @dots{} @end smallexample -@cindex @code{udot_prod@var{m}} instruction pattern -@item @samp{udot_prod@var{m}} +@cindex @code{udot_prod@var{m}@var{n}} instruction pattern +@item @samp{udot_prod@var{m}@var{n}} -Compute the sum of the products of two unsigned elements. -Operand 1 and operand 2 are of the same mode. Their -product, which is of a wider mode, is computed and added to operand 3. -Operand 3 is of a mode equal or wider than the mode of the product. The -result is placed in operand 0, which is of the same mode as operand 3. -@var{m} is the mode of operand 1 and operand 2. +Multiply operand 1 by operand 2 without loss of precision, given that +both operands contain unsigned elements. Add each product to the overlapping +element of operand 3 and store the result in operand 0. Operands 0 and 3 +have mode @var{m} and operands 1 and 2 have mode @var{n}, with @var{n} +having narrower elements than @var{m}. Semantically the expressions perform the multiplication in the following signs @@ -5796,14 +5794,12 @@ udot<unsigned op0, unsigned op1, unsigned op2, unsigned op3> == @dots{} @end smallexample -@cindex @code{usdot_prod@var{m}} instruction pattern -@item @samp{usdot_prod@var{m}} -Compute the sum of the products of elements of different signs. -Operand 1 must be unsigned and operand 2 signed. Their -product, which is of a wider mode, is computed and added to operand 3. -Operand 3 is of a mode equal or wider than the mode of the product. The -result is placed in operand 0, which is of the same mode as operand 3. -@var{m} is the mode of operand 1 and operand 2. +@cindex @code{usdot_prod@var{m}@var{n}} instruction pattern +@item @samp{usdot_prod@var{m}@var{n}} +Multiply operand 1 by operand 2. Add each product to the overlapping +element of operand 3 and store the result in operand 0. Operands 0 and 3 +have mode @var{m} and operands 1 and 2 have mode @var{n}, with @var{n} +having narrower elements than @var{m}. Semantically the expressions perform the multiplication in the following signs diff --git a/gcc/optabs.def b/gcc/optabs.def index 58a939442bd..ba860144d8b 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -110,6 +110,9 @@ OPTAB_CD(mask_scatter_store_optab, "mask_scatter_store$a$b") OPTAB_CD(mask_len_scatter_store_optab, "mask_len_scatter_store$a$b") OPTAB_CD(vec_extract_optab, "vec_extract$a$b") OPTAB_CD(vec_init_optab, "vec_init$a$b") +OPTAB_CD (sdot_prod_optab, "sdot_prod$I$a$b") +OPTAB_CD (udot_prod_optab, "udot_prod$I$a$b") +OPTAB_CD (usdot_prod_optab, "usdot_prod$I$a$b") OPTAB_CD (while_ult_optab, "while_ult$a$b") @@ -413,10 +416,7 @@ OPTAB_D (savg_floor_optab, "avg$a3_floor") OPTAB_D (uavg_floor_optab, "uavg$a3_floor") OPTAB_D (savg_ceil_optab, "avg$a3_ceil") OPTAB_D (uavg_ceil_optab, "uavg$a3_ceil") -OPTAB_D (sdot_prod_optab, "sdot_prod$I$a") OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3") -OPTAB_D (udot_prod_optab, "udot_prod$I$a") -OPTAB_D (usdot_prod_optab, "usdot_prod$I$a") OPTAB_D (usum_widen_optab, "widen_usum$I$a3") OPTAB_D (usad_optab, "usad$I$a") OPTAB_D (ssad_optab, "ssad$I$a")