Message ID | 163c9a9d-6755-4fbc-9c5a-82475fd1d4b1@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | rs6000, Add new overloaded vector shift builtin int128, varients | expand |
On 7/19/24 3:04 PM, Carl Love wrote: > diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md > index 5af9bf920a2..2a18ee44526 100644 > --- a/gcc/config/rs6000/altivec.md > +++ b/gcc/config/rs6000/altivec.md > @@ -878,9 +878,9 @@ (define_int_attr SLDB_lr [(UNSPEC_SLDB "l") > (define_int_iterator VSHIFT_DBL_LR [UNSPEC_SLDB UNSPEC_SRDB]) > > (define_insn "vs<SLDB_lr>db_<mode>" > - [(set (match_operand:VI2 0 "register_operand" "=v") > - (unspec:VI2 [(match_operand:VI2 1 "register_operand" "v") > - (match_operand:VI2 2 "register_operand" "v") > + [(set (match_operand:VEC_IC 0 "register_operand" "=v") > + (unspec:VEC_IC [(match_operand:VEC_IC 1 "register_operand" "v") > + (match_operand:VEC_IC 2 "register_operand" "v") > (match_operand:QI 3 "const_0_to_12_operand" "n")] > VSHIFT_DBL_LR))] > "TARGET_POWER10" I know the old code used the register_operand predicate for the vector operands, but those really should be changed to altivec_register_operand. Peter
Peter: On 7/23/24 2:26 PM, Peter Bergner wrote: > On 7/19/24 3:04 PM, Carl Love wrote: >> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md >> index 5af9bf920a2..2a18ee44526 100644 >> --- a/gcc/config/rs6000/altivec.md >> +++ b/gcc/config/rs6000/altivec.md >> @@ -878,9 +878,9 @@ (define_int_attr SLDB_lr [(UNSPEC_SLDB "l") >> (define_int_iterator VSHIFT_DBL_LR [UNSPEC_SLDB UNSPEC_SRDB]) >> >> (define_insn "vs<SLDB_lr>db_<mode>" >> - [(set (match_operand:VI2 0 "register_operand" "=v") >> - (unspec:VI2 [(match_operand:VI2 1 "register_operand" "v") >> - (match_operand:VI2 2 "register_operand" "v") >> + [(set (match_operand:VEC_IC 0 "register_operand" "=v") >> + (unspec:VEC_IC [(match_operand:VEC_IC 1 "register_operand" "v") >> + (match_operand:VEC_IC 2 "register_operand" "v") >> (match_operand:QI 3 "const_0_to_12_operand" "n")] >> VSHIFT_DBL_LR))] >> "TARGET_POWER10" > I know the old code used the register_operand predicate for the vector > operands, but those really should be changed to altivec_register_operand. > > Peter OK, will add that change and retest the patch. Thanks. Carl
Hi! So much manual stuff needed, sigh. On Fri, Jul 19, 2024 at 01:04:12PM -0700, Carl Love wrote: > gcc/ChangeLog: > * config/rs6000/altivec.md (vs<SLDB_lr>db_<mode>): Change > define_insn iterator to VEC_IC. From VI2 (a nothing-saying name) to VEC_IC (also a nonsensical name). Maybe VEC_IC should have a comment explaining the TARGET_POWER10 thing at least? Just something like "ISA 3.1 added 128-bit things" or whatever, but don't leave the reader second-guessing, a reader will often guess wrong :-) > gcc/testsuite/ChangeLog: > * gcc.target/powerpc/vec-shift-double-runnable-int128.c: New test > file. Please don't line-wrap where not wanted. Changelog lines are 80 character positions wide. (Or 79 if you want, but heh). > +The above instances are extension of the exiting overloaded built-ins (existing) > a/gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c > b/gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c > new file mode 100644 > index 00000000000..bb90f489149 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c > @@ -0,0 +1,349 @@ > +/* { dg-do run { target { int128 } && { power10_hw } } } */ Everything power10 is int128 always. > +/* { dg-do link { target { ! power10_hw } } } */ > +/* { dg-require-effective-target power10_ok } */ So this is enough always. Often we have two testcases, one for run, one for compiling only. It's a bit simpler and cleaner. > +/* { dg-options "-mdejagnu-cpu=power10 -save-temps" } */ Why the -save-temps? Always document it if you want that for something, but never put it in the testcase if not. A leftover from development? Okay for trunk, thank you! Well Peter had some comments too, modulo those I guess, I'll read them now ;-) Segher
On Tue, Jul 23, 2024 at 04:26:43PM -0500, Peter Bergner wrote: > On 7/19/24 3:04 PM, Carl Love wrote: > > diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md > > index 5af9bf920a2..2a18ee44526 100644 > > --- a/gcc/config/rs6000/altivec.md > > +++ b/gcc/config/rs6000/altivec.md > > @@ -878,9 +878,9 @@ (define_int_attr SLDB_lr [(UNSPEC_SLDB "l") > > (define_int_iterator VSHIFT_DBL_LR [UNSPEC_SLDB UNSPEC_SRDB]) > > > > (define_insn "vs<SLDB_lr>db_<mode>" > > - [(set (match_operand:VI2 0 "register_operand" "=v") > > - (unspec:VI2 [(match_operand:VI2 1 "register_operand" "v") > > - (match_operand:VI2 2 "register_operand" "v") > > + [(set (match_operand:VEC_IC 0 "register_operand" "=v") > > + (unspec:VEC_IC [(match_operand:VEC_IC 1 "register_operand" "v") > > + (match_operand:VEC_IC 2 "register_operand" "v") > > (match_operand:QI 3 "const_0_to_12_operand" "n")] > > VSHIFT_DBL_LR))] > > "TARGET_POWER10" > > I know the old code used the register_operand predicate for the vector > operands, but those really should be changed to altivec_register_operand. register_operand is just fine usually. The "v" constraint already makes sure things end up in a VMX (a lower VSX) register, the predicate doesn't help here. register_operand is shorter (and thus, preferred), and also more likely correct if the code changes later :-) Segher
On 7/24/24 12:06 PM, Segher Boessenkool wrote: > On Tue, Jul 23, 2024 at 04:26:43PM -0500, Peter Bergner wrote: >> On 7/19/24 3:04 PM, Carl Love wrote: >>> (define_insn "vs<SLDB_lr>db_<mode>" >>> - [(set (match_operand:VI2 0 "register_operand" "=v") >>> - (unspec:VI2 [(match_operand:VI2 1 "register_operand" "v") >>> - (match_operand:VI2 2 "register_operand" "v") >>> + [(set (match_operand:VEC_IC 0 "register_operand" "=v") >>> + (unspec:VEC_IC [(match_operand:VEC_IC 1 "register_operand" "v") >>> + (match_operand:VEC_IC 2 "register_operand" "v") >>> (match_operand:QI 3 "const_0_to_12_operand" "n")] >>> VSHIFT_DBL_LR))] >>> "TARGET_POWER10" >> >> I know the old code used the register_operand predicate for the vector >> operands, but those really should be changed to altivec_register_operand. > > register_operand is just fine usually. The "v" constraint already makes > sure things end up in a VMX (a lower VSX) register, the predicate > doesn't help here. register_operand is shorter (and thus, preferred), > and also more likely correct if the code changes later :-) I thought we always wanted the predicate to match the constraint being used? Peter
On 7/24/24 12:03 PM, Segher Boessenkool wrote: >> +/* { dg-options "-mdejagnu-cpu=power10 -save-temps" } */ > > Why the -save-temps? Always document it if you want that for something, > but never put it in the testcase if not. A leftover from development? I can answer this one. :-). Since these are dg-do run/link tests, we need the -save-temps to keep the assembler files around for Carl's scan-assembler tests at the end of the test case. Peter
On Wed, Jul 24, 2024 at 12:12:05PM -0500, Peter Bergner wrote: > On 7/24/24 12:06 PM, Segher Boessenkool wrote: > I thought we always wanted the predicate to match the constraint being used? Predicates and constraints have different purposes, and are used at different times (typically). Everything before RA is predicates only, and RA and everything after it use constraints (as well). register_operand says it has to be a register. It allows any pseudo-register, so before RA, there is no real difference between register_operand and altivec_register_operand (which allows all pseudos as well).. The constraint should not demand things that weren't clear earlier, because that will then cause reloading eventually, often with less efficient code. It still will *work* though. But that is not the case here :-) Segher
On Wed, Jul 24, 2024 at 12:16:33PM -0500, Peter Bergner wrote: > > On 7/24/24 12:03 PM, Segher Boessenkool wrote: > >> +/* { dg-options "-mdejagnu-cpu=power10 -save-temps" } */ > > > > Why the -save-temps? Always document it if you want that for something, > > but never put it in the testcase if not. A leftover from development? > > I can answer this one. :-). Since these are dg-do run/link tests, > we need the -save-temps to keep the assembler files around for Carl's > scan-assembler tests at the end of the test case. Ah! Gotcha. Please add a comment then? Just something trivial, a short line is enough :-) Segher
Segher: Thanks for the review, a few questions... On 7/24/24 10:03 AM, Segher Boessenkool wrote: > Hi! > > So much manual stuff needed, sigh. > > On Fri, Jul 19, 2024 at 01:04:12PM -0700, Carl Love wrote: >> gcc/ChangeLog: >> * config/rs6000/altivec.md (vs<SLDB_lr>db_<mode>): Change >> define_insn iterator to VEC_IC. > From VI2 (a nothing-saying name) to VEC_IC (also a nonsensical name). > > Maybe VEC_IC should have a comment explaining the TARGET_POWER10 thing > at least? Just something like "ISA 3.1 added 128-bit things" or > whatever, but don't leave the reader second-guessing, a reader will > often guess wrong :-) I don't disagree that the reader will guess wrong, probably after being frustated that it isn't obvious. :-) The VEC_IC was an existing definition, this patch does not add it. Your comments seems to imply you want a comment on the definition for VEC_IC in vector.md? I could add one to the existing definition if you like but it seems outside the scope of this patch. The change log entry could be improved to say "Change define_insn iterator to VEC_IC which included the V1TI type added in ISA 3.1." Would that address your concerns? > >> gcc/testsuite/ChangeLog: >> * gcc.target/powerpc/vec-shift-double-runnable-int128.c: New test >> file. > Please don't line-wrap where not wanted. Changelog lines are 80 > character positions wide. (Or 79 if you want, but heh). Yea, it does look like file will just fit on the same line. Fixed. > >> +The above instances are extension of the exiting overloaded built-ins > (existing) Fixed spelling error. > >> a/gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c >> b/gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c >> new file mode 100644 >> index 00000000000..bb90f489149 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c >> @@ -0,0 +1,349 @@ >> +/* { dg-do run { target { int128 } && { power10_hw } } } */ > Everything power10 is int128 always. OK, so don't need the power10_hw. Changed to just int128 for the target: /* { dg-do run target int128 } */ >> +/* { dg-do link { target { ! power10_hw } } } */ >> +/* { dg-require-effective-target power10_ok } */ > So this is enough always. > > Often we have two testcases, one for run, one for compiling only. It's > a bit simpler and cleaner. Sounds like you would prefer to have a run and a compile test file? I will create a new file vec-shift-double-int128.c consisting of a series of functions to test each built-in definition. > >> +/* { dg-options "-mdejagnu-cpu=power10 -save-temps" } */ > Why the -save-temps? Always document it if you want that for something, > but never put it in the testcase if not. A leftover from development? > > Okay for trunk, thank you! Well Peter had some comments too, modulo > those I guess, I'll read them now ;-) So as Peter said, the save-temps is because the runnable case file also checks for assembler times at the end of the file. I will move the scan-assembler-times checks to the new compile only test. Carl
Hi! On Wed, Jul 24, 2024 at 11:38:11AM -0700, Carl Love wrote: > On 7/24/24 10:03 AM, Segher Boessenkool wrote: > >So much manual stuff needed, sigh. > > > >On Fri, Jul 19, 2024 at 01:04:12PM -0700, Carl Love wrote: > >>gcc/ChangeLog: > >> * config/rs6000/altivec.md (vs<SLDB_lr>db_<mode>): Change > >> define_insn iterator to VEC_IC. > > From VI2 (a nothing-saying name) to VEC_IC (also a nonsensical name). > > > >Maybe VEC_IC should have a comment explaining the TARGET_POWER10 thing > >at least? Just something like "ISA 3.1 added 128-bit things" or > >whatever, but don't leave the reader second-guessing, a reader will > >often guess wrong :-) > > I don't disagree that the reader will guess wrong, probably after being > frustated that it isn't obvious. :-) > The VEC_IC was an existing definition, this patch does not add it. Your > comments seems to imply you want a comment on the definition for VEC_IC > in vector.md? I could add one to the existing definition if you like > but it seems outside the scope of this patch. Yes, I'm just lamenting the state of things :-) It would have to be a separate patch, yes. A trivial patch to add such a comment is pre-approved :-) > The change log entry could be improved to say "Change define_insn > iterator to VEC_IC which included the V1TI type added in ISA 3.1." Would > that address your concerns? The current changelog is fine. Changelogs never can replace comments in the code. > >>+/* { dg-do run { target { int128 } && { power10_hw } } } */ > >Everything power10 is int128 always. > > OK, so don't need the power10_hw. Changed to just int128 for the target: No, the other way around: you cannot run the code on machines without these (ISA 3.1) instructions! But p10 always satisfies the int128 predicate. Although, hrm, how about -m32 :-) > /* { dg-do run target int128 } */ > >>+/* { dg-do link { target { ! power10_hw } } } */ > >>+/* { dg-require-effective-target power10_ok } */ > >So this is enough always. > > > >Often we have two testcases, one for run, one for compiling only. It's > >a bit simpler and cleaner. > > Sounds like you would prefer to have a run and a compile test file? I > will create a new file vec-shift-double-int128.c consisting of a series > of functions to test each built-in definition. No, I don't prefer that, but it is easier to handle (also for you). That that results in a bit more files, who cares, I don't anyway :-) > >>+/* { dg-options "-mdejagnu-cpu=power10 -save-temps" } */ > >Why the -save-temps? Always document it if you want that for something, > >but never put it in the testcase if not. A leftover from development? > > > >Okay for trunk, thank you! Well Peter had some comments too, modulo > >those I guess, I'll read them now ;-) > So as Peter said, the save-temps is because the runnable case file also > checks for assembler times at the end of the file. Yup. A comment would help :-) Segher
Hi Carl, Some minor comments are inlined on top of Segher's and Peter's comments. on 2024/7/20 04:04, Carl Love wrote: > GCC developers: > > The following patch adds the int128 varients to the existing overloaded built-ins vec_sld, vec_sldb, vec_sldw, vec_sll, vec_slo, vec_srdb, vec_srl, vec_sro. These varients were requested by Steve Munroe. > > The patch has been tested on a Power 10 system with no regressions. > > Please let me know if the patch is acceptable for mainline. > > Carl > > > ------------------------------------------------------------------------------------------------------- > rs6000, Add new overloaded vector shift builtin int128 varients > > Add the signed __int128 and unsigned __int128 argument types for the > overloaded built-ins vec_sld, vec_sldb, vec_sldw, vec_sll, vec_slo, > vec_srdb, vec_srl, vec_sro. For each of the new argument types add a > testcase and update the documentation for the built-in. > > Add the missing internal names for the float and double types for > overloaded builtin vec_sld for the float and double types. This isn't needed, see below explanation. > > gcc/ChangeLog: > * config/rs6000/altivec.md (vs<SLDB_lr>db_<mode>): Change > define_insn iterator to VEC_IC. > * config/rs6000/rs6000-builtins.def (__builtin_altivec_vsldoi_v1ti, > __builtin_vsx_xxsldwi_v1ti, __builtin_altivec_vsldb_v1ti, > __builtin_altivec_vsrdb_v1ti): New builtin definitions. > * config/rs6000/rs6000-overload.def (vec_sld, vec_sldb, vec_sldw, > vec_sll, vec_slo, vec_srdb, vec_srl, vec_sro): New overloaded > definitions. > (vec_sld): Add missing internal names. > * doc/extend.texi (vec_sld, vec_sldb, vec_sldw, vec_sll, vec_slo, > vec_srdb, vec_srl, vec_sro): Add documentation for new overloaded > built-ins. > > gcc/testsuite/ChangeLog: > * gcc.target/powerpc/vec-shift-double-runnable-int128.c: New test > file. > --- > gcc/config/rs6000/altivec.md | 6 +- > gcc/config/rs6000/rs6000-builtins.def | 12 + > gcc/config/rs6000/rs6000-overload.def | 44 ++- > gcc/doc/extend.texi | 42 +++ > .../vec-shift-double-runnable-int128.c | 349 ++++++++++++++++++ > 5 files changed, 448 insertions(+), 5 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c > > diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md > index 5af9bf920a2..2a18ee44526 100644 > --- a/gcc/config/rs6000/altivec.md > +++ b/gcc/config/rs6000/altivec.md > @@ -878,9 +878,9 @@ (define_int_attr SLDB_lr [(UNSPEC_SLDB "l") > (define_int_iterator VSHIFT_DBL_LR [UNSPEC_SLDB UNSPEC_SRDB]) > > (define_insn "vs<SLDB_lr>db_<mode>" > - [(set (match_operand:VI2 0 "register_operand" "=v") > - (unspec:VI2 [(match_operand:VI2 1 "register_operand" "v") > - (match_operand:VI2 2 "register_operand" "v") > + [(set (match_operand:VEC_IC 0 "register_operand" "=v") > + (unspec:VEC_IC [(match_operand:VEC_IC 1 "register_operand" "v") > + (match_operand:VEC_IC 2 "register_operand" "v") > (match_operand:QI 3 "const_0_to_12_operand" "n")] > VSHIFT_DBL_LR))] > "TARGET_POWER10" > diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def > index 77eb0f7e406..fbb6e1ddf85 100644 > --- a/gcc/config/rs6000/rs6000-builtins.def > +++ b/gcc/config/rs6000/rs6000-builtins.def > @@ -964,6 +964,9 @@ > const vss __builtin_altivec_vsldoi_8hi (vss, vss, const int<4>); > VSLDOI_8HI altivec_vsldoi_v8hi {} > > + const vsq __builtin_altivec_vsldoi_v1ti (vsq, vsq, const int<4>); > + VSLDOI_V1TI altivec_vsldoi_v1ti {} > + > const vss __builtin_altivec_vslh (vss, vus); > VSLH vashlv8hi3 {} > > @@ -1831,6 +1834,9 @@ > const vsll __builtin_vsx_xxsldwi_2di (vsll, vsll, const int<2>); > XXSLDWI_2DI vsx_xxsldwi_v2di {} > > + const vsq __builtin_vsx_xxsldwi_v1ti (vsq, vsq, const int<2>); > + XXSLDWI_Q vsx_xxsldwi_v1ti {} > + > const vf __builtin_vsx_xxsldwi_4sf (vf, vf, const int<2>); > XXSLDWI_4SF vsx_xxsldwi_v4sf {} > > @@ -3299,6 +3305,9 @@ > const vss __builtin_altivec_vsldb_v8hi (vss, vss, const int<3>); > VSLDB_V8HI vsldb_v8hi {} > > + const vsq __builtin_altivec_vsldb_v1ti (vsq, vsq, const int<3>); > + VSLDB_V1TI vsldb_v1ti {} > + > const vsq __builtin_altivec_vslq (vsq, vuq); > VSLQ vashlv1ti3 {} > > @@ -3317,6 +3326,9 @@ > const vss __builtin_altivec_vsrdb_v8hi (vss, vss, const int<3>); > VSRDB_V8HI vsrdb_v8hi {} > > + const vsq __builtin_altivec_vsrdb_v1ti (vsq, vsq, const int<3>); > + VSRDB_V1TI vsrdb_v1ti {} > + > const vsq __builtin_altivec_vsrq (vsq, vuq); > VSRQ vlshrv1ti3 {} > > diff --git a/gcc/config/rs6000/rs6000-overload.def b/gcc/config/rs6000/rs6000-overload.def > index c4ecafc6f7e..302e0232533 100644 > --- a/gcc/config/rs6000/rs6000-overload.def > +++ b/gcc/config/rs6000/rs6000-overload.def > @@ -3396,9 +3396,13 @@ > vull __builtin_vec_sld (vull, vull, const int); > VSLDOI_2DI VSLDOI_VULL > vf __builtin_vec_sld (vf, vf, const int); > - VSLDOI_4SF > + VSLDOI_4SF VSLDOI_VF > vd __builtin_vec_sld (vd, vd, const int); > - VSLDOI_2DF > + VSLDOI_2DF VSLDOI_VD The other instances for vector integer type have multiple uses of 1st token, such as: vsll __builtin_vec_sld (vsll, vsll, const int); VSLDOI_2DI VSLDOI_VSLL vbll __builtin_vec_sld (vbll, vbll, const int); VSLDOI_2DI VSLDOI_VBLL vull __builtin_vec_sld (vull, vull, const int); VSLDOI_2DI VSLDOI_VULL , it's unable to use the 1st token VSLDOI_2DI for the overload id (otherwise it can be ambiguous), but for vector float/double they don't have multiple variants, VSLDOI_4SF and VSLDOI_2DF are used once respectively so they are fine here. I think the existing code is intentional so let's keep them unchanged (creating more unnecessary ids is slightly worse than before). > + vsq __builtin_vec_sld (vsq, vsq, const int); > + VSLDOI_V1TI VSLDOI_VSQ > + vuq __builtin_vec_sld (vuq, vuq, const int); > + VSLDOI_V1TI VSLDOI_VUQ > > [VEC_SLDB, vec_sldb, __builtin_vec_sldb] > vsc __builtin_vec_sldb (vsc, vsc, const int); > @@ -3417,6 +3421,10 @@ > VSLDB_V2DI VSLDB_VSLL > vull __builtin_vec_sldb (vull, vull, const int); > VSLDB_V2DI VSLDB_VULL > + vsq __builtin_vec_sldb (vsq, vsq, const int); > + VSLDB_V1TI VSLDB_VSQ > + vuq __builtin_vec_sldb (vuq, vuq, const int); > + VSLDB_V1TI VSLDB_VUQ > > [VEC_SLDW, vec_sldw, __builtin_vec_sldw] > vsc __builtin_vec_sldw (vsc, vsc, const int); > @@ -3439,6 +3447,10 @@ > XXSLDWI_4SF XXSLDWI_VF > vd __builtin_vec_sldw (vd, vd, const int); > XXSLDWI_2DF XXSLDWI_VD > + vsq __builtin_vec_sldw (vsq, vsq, const int); > + XXSLDWI_Q XXSLDWI_VSQ > + vuq __builtin_vec_sldw (vuq, vuq, const int); > + XXSLDWI_Q XXSLDWI_VUQ Nit: s/XXSLDWI_Q/XXSLDWI_1TI/ to keep consistent with the other XXSLDWI_* as 1st token (XXSLDWI_16QI etc. are used above rather than XXSLDWI_{SC,UC} etc.) > > [VEC_SLL, vec_sll, __builtin_vec_sll] > vsc __builtin_vec_sll (vsc, vuc); > @@ -3459,6 +3471,10 @@ > VSL VSL_VSLL > vull __builtin_vec_sll (vull, vuc); > VSL VSL_VULL > + vsq __builtin_vec_sll (vsq, vuc); > + VSL VSL_VSQ > + vuq __builtin_vec_sll (vuq, vuc); > + VSL VSL_VUQ > ; The following variants are deprecated. > vsc __builtin_vec_sll (vsc, vus); > VSL VSL_VSC_VUS > @@ -3554,6 +3570,14 @@ > VSLO VSLO_VFS > vf __builtin_vec_slo (vf, vuc); > VSLO VSLO_VFU > + vsq __builtin_vec_slo (vsq, vsc); > + VSLO VSLDO_VSQS > + vsq __builtin_vec_slo (vsq, vuc); > + VSLO VSLDO_VSQU > + vuq __builtin_vec_slo (vuq, vsc); > + VSLO VSLDO_VUQS > + vuq __builtin_vec_slo (vuq, vuc); > + VSLO VSLDO_VUQU > > [VEC_SLV, vec_slv, __builtin_vec_vslv] > vuc __builtin_vec_vslv (vuc, vuc); > @@ -3699,6 +3723,10 @@ > VSRDB_V2DI VSRDB_VSLL > vull __builtin_vec_srdb (vull, vull, const int); > VSRDB_V2DI VSRDB_VULL > + vsq __builtin_vec_srdb (vsq, vsq, const int); > + VSRDB_V1TI VSRDB_VSQ > + vuq __builtin_vec_srdb (vuq, vuq, const int); > + VSRDB_V1TI VSRDB_VUQ > > [VEC_SRL, vec_srl, __builtin_vec_srl] > vsc __builtin_vec_srl (vsc, vuc); > @@ -3719,6 +3747,10 @@ > VSR VSR_VSLL > vull __builtin_vec_srl (vull, vuc); > VSR VSR_VULL > + vsq __builtin_vec_srl (vsq, vuc); > + VSR VSR_VSQ > + vuq __builtin_vec_srl (vuq, vuc); > + VSR VSR_VUQ > ; The following variants are deprecated. > vsc __builtin_vec_srl (vsc, vus); > VSR VSR_VSC_VUS > @@ -3808,6 +3840,14 @@ > VSRO VSRO_VFS > vf __builtin_vec_sro (vf, vuc); > VSRO VSRO_VFU > + vsq __builtin_vec_sro (vsq, vsc); > + VSRO VSRDO_VSQS > + vsq __builtin_vec_sro (vsq, vuc); > + VSRO VSRDO_VSQU > + vuq __builtin_vec_sro (vuq, vsc); > + VSRO VSRDO_VUQS > + vuq __builtin_vec_sro (vuq, vuc); > + VSRO VSRDO_VUQU > > [VEC_SRV, vec_srv, __builtin_vec_vsrv] > vuc __builtin_vec_vsrv (vuc, vuc); > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi > index 0b572afca72..5125a6d9def 100644 > --- a/gcc/doc/extend.texi > +++ b/gcc/doc/extend.texi > @@ -23504,6 +23504,10 @@ const unsigned int); > vector signed long long, const unsigned int); > @exdent vector unsigned long long vec_sldb (vector unsigned long long, > vector unsigned long long, const unsigned int); > +@exdent vector signed __int128 vec_sldb (vector signed __int128, > +vector signed __int128, const unsigned int); > +@exdent vector unsigned __int128 vec_sldb (vector unsigned __int128, > +vector unsigned __int128, const unsigned int); > @end smallexample > > Shift the combined input vectors left by the amount specified by the low-order > @@ -23531,6 +23535,10 @@ const unsigned int); > vector signed long long, const unsigned int); > @exdent vector unsigned long long vec_srdb (vector unsigned long long, > vector unsigned long long, const unsigned int); > +@exdent vector signed __int128 vec_srdb (vector signed __int128, > +vector signed __int128, const unsigned int); > +@exdent vector unsigned __int128 vec_srdb (vector unsigned __int128, > +vector unsigned __int128, const unsigned int); > @end smallexample > > Shift the combined input vectors right by the amount specified by the low-order > @@ -24025,6 +24033,40 @@ int vec_any_le (vector signed __int128, vector signed __int128); > int vec_any_le (vector unsigned __int128, vector unsigned __int128); > @end smallexample > Personally I prefer to move the below paragraph for description here to avoid that two @smallexample sections are placed together, that is: "The following instances are extension of the existing overloaded built-ins @code{vec_sld}, @code{vec_sldw}, @code{vec_slo}, @code{vec_sro}, @code{vec_srl} that are documented in the PVIPR." @smallexample .... BR, Kewen
Kewen: On 7/25/24 1:21 AM, Kewen.Lin wrote: > Hi Carl, > > Some minor comments are inlined on top of Segher's and Peter's comments. > > on 2024/7/20 04:04, Carl Love wrote: >> GCC developers: >> >> The following patch adds the int128 varients to the existing overloaded built-ins vec_sld, vec_sldb, vec_sldw, vec_sll, vec_slo, vec_srdb, vec_srl, vec_sro. These varients were requested by Steve Munroe. >> >> The patch has been tested on a Power 10 system with no regressions. >> >> Please let me know if the patch is acceptable for mainline. >> >> Carl >> >> >> ------------------------------------------------------------------------------------------------------- >> rs6000, Add new overloaded vector shift builtin int128 varients >> >> Add the signed __int128 and unsigned __int128 argument types for the >> overloaded built-ins vec_sld, vec_sldb, vec_sldw, vec_sll, vec_slo, >> vec_srdb, vec_srl, vec_sro. For each of the new argument types add a >> testcase and update the documentation for the built-in. >> >> Add the missing internal names for the float and double types for >> overloaded builtin vec_sld for the float and double types. > This isn't needed, see below explanation. OK, per comments below, removed the additional internal names. <snip> >> diff --git a/gcc/config/rs6000/rs6000-overload.def b/gcc/config/rs6000/rs6000-overload.def >> index c4ecafc6f7e..302e0232533 100644 >> --- a/gcc/config/rs6000/rs6000-overload.def >> +++ b/gcc/config/rs6000/rs6000-overload.def >> @@ -3396,9 +3396,13 @@ >> vull __builtin_vec_sld (vull, vull, const int); >> VSLDOI_2DI VSLDOI_VULL >> vf __builtin_vec_sld (vf, vf, const int); >> - VSLDOI_4SF >> + VSLDOI_4SF VSLDOI_VF >> vd __builtin_vec_sld (vd, vd, const int); >> - VSLDOI_2DF >> + VSLDOI_2DF VSLDOI_VD > The other instances for vector integer type have multiple uses of 1st token, > such as: > > vsll __builtin_vec_sld (vsll, vsll, const int); > VSLDOI_2DI VSLDOI_VSLL > vbll __builtin_vec_sld (vbll, vbll, const int); > VSLDOI_2DI VSLDOI_VBLL > vull __builtin_vec_sld (vull, vull, const int); > VSLDOI_2DI VSLDOI_VULL > > , it's unable to use the 1st token VSLDOI_2DI for the overload id (otherwise > it can be ambiguous), but for vector float/double they don't have multiple > variants, VSLDOI_4SF and VSLDOI_2DF are used once respectively so they are > fine here. I think the existing code is intentional so let's keep them > unchanged (creating more unnecessary ids is slightly worse than before). OK, removed the addtional tokens VSLDOI_VF and VSLDOI_VD. > >> + vsq __builtin_vec_sld (vsq, vsq, const int); >> + VSLDOI_V1TI VSLDOI_VSQ >> + vuq __builtin_vec_sld (vuq, vuq, const int); >> + VSLDOI_V1TI VSLDOI_VUQ >> >> [VEC_SLDB, vec_sldb, __builtin_vec_sldb] >> vsc __builtin_vec_sldb (vsc, vsc, const int); >> @@ -3417,6 +3421,10 @@ >> VSLDB_V2DI VSLDB_VSLL >> vull __builtin_vec_sldb (vull, vull, const int); >> VSLDB_V2DI VSLDB_VULL >> + vsq __builtin_vec_sldb (vsq, vsq, const int); >> + VSLDB_V1TI VSLDB_VSQ >> + vuq __builtin_vec_sldb (vuq, vuq, const int); >> + VSLDB_V1TI VSLDB_VUQ >> >> [VEC_SLDW, vec_sldw, __builtin_vec_sldw] >> vsc __builtin_vec_sldw (vsc, vsc, const int); >> @@ -3439,6 +3447,10 @@ >> XXSLDWI_4SF XXSLDWI_VF >> vd __builtin_vec_sldw (vd, vd, const int); >> XXSLDWI_2DF XXSLDWI_VD >> + vsq __builtin_vec_sldw (vsq, vsq, const int); >> + XXSLDWI_Q XXSLDWI_VSQ >> + vuq __builtin_vec_sldw (vuq, vuq, const int); >> + XXSLDWI_Q XXSLDWI_VUQ > Nit: s/XXSLDWI_Q/XXSLDWI_1TI/ to keep consistent with the > other XXSLDWI_* as 1st token (XXSLDWI_16QI etc. are used > above rather than XXSLDWI_{SC,UC} etc.) OK, changed to: vsq __builtin_vec_sldw (vsq, vsq, const int); XXSLDWI_1TI XXSLDWI_VSQ vuq __builtin_vec_sldw (vuq, vuq, const int); XXSLDWI_1TI XXSLDWI_VUQ <snip> >> [VEC_SRV, vec_srv, __builtin_vec_vsrv] >> vuc __builtin_vec_vsrv (vuc, vuc); >> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi >> index 0b572afca72..5125a6d9def 100644 >> --- a/gcc/doc/extend.texi >> +++ b/gcc/doc/extend.texi >> @@ -23504,6 +23504,10 @@ const unsigned int); >> vector signed long long, const unsigned int); >> @exdent vector unsigned long long vec_sldb (vector unsigned long long, >> vector unsigned long long, const unsigned int); >> +@exdent vector signed __int128 vec_sldb (vector signed __int128, >> +vector signed __int128, const unsigned int); >> +@exdent vector unsigned __int128 vec_sldb (vector unsigned __int128, >> +vector unsigned __int128, const unsigned int); >> @end smallexample >> >> Shift the combined input vectors left by the amount specified by the low-order >> @@ -23531,6 +23535,10 @@ const unsigned int); >> vector signed long long, const unsigned int); >> @exdent vector unsigned long long vec_srdb (vector unsigned long long, >> vector unsigned long long, const unsigned int); >> +@exdent vector signed __int128 vec_srdb (vector signed __int128, >> +vector signed __int128, const unsigned int); >> +@exdent vector unsigned __int128 vec_srdb (vector unsigned __int128, >> +vector unsigned __int128, const unsigned int); >> @end smallexample >> >> Shift the combined input vectors right by the amount specified by the low-order >> @@ -24025,6 +24033,40 @@ int vec_any_le (vector signed __int128, vector signed __int128); >> int vec_any_le (vector unsigned __int128, vector unsigned __int128); >> @end smallexample >> > Personally I prefer to move the below paragraph for description here to avoid > that two @smallexample sections are placed together, that is: > > "The following instances are extension of the existing overloaded built-ins > @code{vec_sld}, @code{vec_sldw}, @code{vec_slo}, @code{vec_sro}, @code{vec_srl} > that are documented in the PVIPR." > > @smallexample > .... OK, moved the changes for the extensions to the IPVR to right after the new vec_srb entries. Carl
Peter, Segher: On 7/23/24 2:26 PM, Peter Bergner wrote: > On 7/19/24 3:04 PM, Carl Love wrote: >> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md >> index 5af9bf920a2..2a18ee44526 100644 >> --- a/gcc/config/rs6000/altivec.md >> +++ b/gcc/config/rs6000/altivec.md >> @@ -878,9 +878,9 @@ (define_int_attr SLDB_lr [(UNSPEC_SLDB "l") >> (define_int_iterator VSHIFT_DBL_LR [UNSPEC_SLDB UNSPEC_SRDB]) >> >> (define_insn "vs<SLDB_lr>db_<mode>" >> - [(set (match_operand:VI2 0 "register_operand" "=v") >> - (unspec:VI2 [(match_operand:VI2 1 "register_operand" "v") >> - (match_operand:VI2 2 "register_operand" "v") >> + [(set (match_operand:VEC_IC 0 "register_operand" "=v") >> + (unspec:VEC_IC [(match_operand:VEC_IC 1 "register_operand" "v") >> + (match_operand:VEC_IC 2 "register_operand" "v") >> (match_operand:QI 3 "const_0_to_12_operand" "n")] >> VSHIFT_DBL_LR))] >> "TARGET_POWER10" > I know the old code used the register_operand predicate for the vector > operands, but those really should be changed to altivec_register_operand. > > Peter Segher's response was: register_operand is just fine usually. The "v" constraint already makes sure things end up in a VMX (a lower VSX) register, the predicate doesn't help here. register_operand is shorter (and thus, preferred), and also more likely correct if the code changes later 🙂 Which Peter said and Segher responded: On Wed, Jul 24, 2024 at 12:12:05PM -0500, Peter Bergner wrote: > On 7/24/24 12:06 PM, Segher Boessenkool wrote: > I thought we always wanted the predicate to match the constraint being used? Predicates and constraints have different purposes, and are used at different times (typically). Everything before RA is predicates only, and RA and everything after it use constraints (as well). register_operand says it has to be a register. It allows any pseudo-register, so before RA, there is no real difference between register_operand and altivec_register_operand (which allows all pseudos as well).. The constraint should not demand things that weren't clear earlier, because that will then cause reloading eventually, often with less efficient code. It still will*work* though. But that is not the case here 🙂 So, I think the final word here is don't change it. Carl >
Segher: On 7/24/24 11:47 AM, Segher Boessenkool wrote: > Hi! > > On Wed, Jul 24, 2024 at 11:38:11AM -0700, Carl Love wrote: >> On 7/24/24 10:03 AM, Segher Boessenkool wrote: >>> So much manual stuff needed, sigh. >>> >>> On Fri, Jul 19, 2024 at 01:04:12PM -0700, Carl Love wrote: >>>> gcc/ChangeLog: >>>> * config/rs6000/altivec.md (vs<SLDB_lr>db_<mode>): Change >>>> define_insn iterator to VEC_IC. >>> From VI2 (a nothing-saying name) to VEC_IC (also a nonsensical name). >>> >>> Maybe VEC_IC should have a comment explaining the TARGET_POWER10 thing >>> at least? Just something like "ISA 3.1 added 128-bit things" or >>> whatever, but don't leave the reader second-guessing, a reader will >>> often guess wrong :-) >> I don't disagree that the reader will guess wrong, probably after being >> frustated that it isn't obvious. :-) >> The VEC_IC was an existing definition, this patch does not add it. Your >> comments seems to imply you want a comment on the definition for VEC_IC >> in vector.md? I could add one to the existing definition if you like >> but it seems outside the scope of this patch. > Yes, I'm just lamenting the state of things :-) > > It would have to be a separate patch, yes. A trivial patch to add such > a comment is pre-approved :-) OK, no changes in this patch. Will do a second patch. Best that I post it for review anyway. > >> The change log entry could be improved to say "Change define_insn >> iterator to VEC_IC which included the V1TI type added in ISA 3.1." Would >> that address your concerns? > The current changelog is fine. Changelogs never can replace comments in > the code. OK >>>> +/* { dg-do run { target { int128 } && { power10_hw } } } */ >>> Everything power10 is int128 always. >> OK, so don't need the power10_hw. Changed to just int128 for the target: > No, the other way around: you cannot run the code on machines without > these (ISA 3.1) instructions! OK, made it { dg-do run { target power10_hw } } */ > > But p10 always satisfies the int128 predicate. Although, hrm, how > about -m32 :-) If I test with -m64 I get 8 passes: make -j 1 && make check-gcc RUNTESTFLAGS="-v -v powerpc.exp=vec-shift-double-runnable-int128.c --target_board=unix'{-m64}'" # of expected passes 8 If I test with -m32 I get unsupported test, make -j 1 && make check-gcc RUNTESTFLAGS="-v -v powerpc.exp=vec-shift-double-runnable-int128.c --target_board=unix'{-m32}'" # of unsupported tests 1 Looking further into the output the checks say: Checking pattern "hppa*-*-hpux*" with powerpc64le-unknown-linux-gnu compiler exited with status 1 output is: cc1: error: '-m32' not supported in this configuration^M check_cached_effective_target power10_hw_available: returning 0 for unix/-m32 is-effective-target: power10_hw 0 Looks like the power10_hw is sufficient to prevent the test from running. Don't need to explicitly check for int128 as well. > >> /* { dg-do run target int128 } */ >>>> +/* { dg-do link { target { ! power10_hw } } } */ >>>> +/* { dg-require-effective-target power10_ok } */ >>> So this is enough always. >>> >>> Often we have two testcases, one for run, one for compiling only. It's >>> a bit simpler and cleaner. >> Sounds like you would prefer to have a run and a compile test file? I >> will create a new file vec-shift-double-int128.c consisting of a series >> of functions to test each built-in definition. > No, I don't prefer that, but it is easier to handle (also for you). > That that results in a bit more files, who cares, I don't anyway :-) > >>>> +/* { dg-options "-mdejagnu-cpu=power10 -save-temps" } */ >>> Why the -save-temps? Always document it if you want that for something, >>> but never put it in the testcase if not. A leftover from development? >>> >>> Okay for trunk, thank you! Well Peter had some comments too, modulo >>> those I guess, I'll read them now ;-) >> So as Peter said, the save-temps is because the runnable case file also >> checks for assembler times at the end of the file. > Yup. A comment would help :-) OK, added comment. /* { dg-require-effective-target power10_ok } */ /* Need -save-temps for dg-final scan-assembler-times at end of test. */ /* { dg-options "-mdejagnu-cpu=power10 -save-temps" } */ Carl
On 7/26/24 12:07 PM, Carl Love wrote: > On 7/24/24 11:47 AM, Segher Boessenkool wrote: >>>>> +/* { dg-do run { target { int128 } && { power10_hw } } } */ >>>> Everything power10 is int128 always. >>> OK, so don't need the power10_hw. Changed to just int128 for the target: >> No, the other way around: you cannot run the code on machines without >> these (ISA 3.1) instructions! > > OK, made it { dg-do run { target power10_hw } } */ >> >> But p10 always satisfies the int128 predicate. Although, hrm, how >> about -m32 :-) > > If I test with -m64 I get 8 passes: > > make -j 1 && make check-gcc RUNTESTFLAGS="-v -v powerpc.exp=vec-shift-double-runnable-int128.c --target_board=unix'{-m64}'" > # of expected passes 8 > > If I test with -m32 I get unsupported test, > > make -j 1 && make check-gcc RUNTESTFLAGS="-v -v powerpc.exp=vec-shift-double-runnable-int128.c --target_board=unix'{-m32}'" > # of unsupported tests 1 > > Looking further into the output the checks say: > Checking pattern "hppa*-*-hpux*" with powerpc64le-unknown-linux-gnu > compiler exited with status 1 > output is: > cc1: error: '-m32' not supported in this configuration^M > > check_cached_effective_target power10_hw_available: returning 0 for unix/-m32 > is-effective-target: power10_hw 0 > > Looks like the power10_hw is sufficient to prevent the test from running. Don't need to explicitly check for int128 as well. That's only because you're testing on LE where -m32 is a not supported configuration. If you tested on BE, you'd probably see a FAIL for -m32. I think your initial target tests with int128 and power10_hw is actually correct. We have an internal Power10 partition (ltcd97-lp7) running a BE distro you can test this on. Peter
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md index 5af9bf920a2..2a18ee44526 100644 --- a/gcc/config/rs6000/altivec.md +++ b/gcc/config/rs6000/altivec.md @@ -878,9 +878,9 @@ (define_int_attr SLDB_lr [(UNSPEC_SLDB "l") (define_int_iterator VSHIFT_DBL_LR [UNSPEC_SLDB UNSPEC_SRDB]) (define_insn "vs<SLDB_lr>db_<mode>" - [(set (match_operand:VI2 0 "register_operand" "=v") - (unspec:VI2 [(match_operand:VI2 1 "register_operand" "v") - (match_operand:VI2 2 "register_operand" "v") + [(set (match_operand:VEC_IC 0 "register_operand" "=v") + (unspec:VEC_IC [(match_operand:VEC_IC 1 "register_operand" "v") + (match_operand:VEC_IC 2 "register_operand" "v") (match_operand:QI 3 "const_0_to_12_operand" "n")] VSHIFT_DBL_LR))] "TARGET_POWER10" diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def index 77eb0f7e406..fbb6e1ddf85 100644 --- a/gcc/config/rs6000/rs6000-builtins.def +++ b/gcc/config/rs6000/rs6000-builtins.def @@ -964,6 +964,9 @@ const vss __builtin_altivec_vsldoi_8hi (vss, vss, const int<4>); VSLDOI_8HI altivec_vsldoi_v8hi {} + const vsq __builtin_altivec_vsldoi_v1ti (vsq, vsq, const int<4>); + VSLDOI_V1TI altivec_vsldoi_v1ti {} + const vss __builtin_altivec_vslh (vss, vus); VSLH vashlv8hi3 {} @@ -1831,6 +1834,9 @@ const vsll __builtin_vsx_xxsldwi_2di (vsll, vsll, const int<2>); XXSLDWI_2DI vsx_xxsldwi_v2di {} + const vsq __builtin_vsx_xxsldwi_v1ti (vsq, vsq, const int<2>); + XXSLDWI_Q vsx_xxsldwi_v1ti {} + const vf __builtin_vsx_xxsldwi_4sf (vf, vf, const int<2>); XXSLDWI_4SF vsx_xxsldwi_v4sf {} @@ -3299,6 +3305,9 @@ const vss __builtin_altivec_vsldb_v8hi (vss, vss, const int<3>); VSLDB_V8HI vsldb_v8hi {} + const vsq __builtin_altivec_vsldb_v1ti (vsq, vsq, const int<3>); + VSLDB_V1TI vsldb_v1ti {} + const vsq __builtin_altivec_vslq (vsq, vuq); VSLQ vashlv1ti3 {} @@ -3317,6 +3326,9 @@ const vss __builtin_altivec_vsrdb_v8hi (vss, vss, const int<3>); VSRDB_V8HI vsrdb_v8hi {} + const vsq __builtin_altivec_vsrdb_v1ti (vsq, vsq, const int<3>); + VSRDB_V1TI vsrdb_v1ti {} + const vsq __builtin_altivec_vsrq (vsq, vuq); VSRQ vlshrv1ti3 {} diff --git a/gcc/config/rs6000/rs6000-overload.def b/gcc/config/rs6000/rs6000-overload.def index c4ecafc6f7e..302e0232533 100644 --- a/gcc/config/rs6000/rs6000-overload.def +++ b/gcc/config/rs6000/rs6000-overload.def @@ -3396,9 +3396,13 @@ vull __builtin_vec_sld (vull, vull, const int); VSLDOI_2DI VSLDOI_VULL vf __builtin_vec_sld (vf, vf, const int); - VSLDOI_4SF + VSLDOI_4SF VSLDOI_VF vd __builtin_vec_sld (vd, vd, const int); - VSLDOI_2DF + VSLDOI_2DF VSLDOI_VD + vsq __builtin_vec_sld (vsq, vsq, const int); + VSLDOI_V1TI VSLDOI_VSQ + vuq __builtin_vec_sld (vuq, vuq, const int); + VSLDOI_V1TI VSLDOI_VUQ [VEC_SLDB, vec_sldb, __builtin_vec_sldb] vsc __builtin_vec_sldb (vsc, vsc, const int); @@ -3417,6 +3421,10 @@ VSLDB_V2DI VSLDB_VSLL vull __builtin_vec_sldb (vull, vull, const int); VSLDB_V2DI VSLDB_VULL + vsq __builtin_vec_sldb (vsq, vsq, const int); + VSLDB_V1TI VSLDB_VSQ + vuq __builtin_vec_sldb (vuq, vuq, const int); + VSLDB_V1TI VSLDB_VUQ [VEC_SLDW, vec_sldw, __builtin_vec_sldw] vsc __builtin_vec_sldw (vsc, vsc, const int); @@ -3439,6 +3447,10 @@ XXSLDWI_4SF XXSLDWI_VF vd __builtin_vec_sldw (vd, vd, const int); XXSLDWI_2DF XXSLDWI_VD + vsq __builtin_vec_sldw (vsq, vsq, const int); + XXSLDWI_Q XXSLDWI_VSQ + vuq __builtin_vec_sldw (vuq, vuq, const int); + XXSLDWI_Q XXSLDWI_VUQ [VEC_SLL, vec_sll, __builtin_vec_sll] vsc __builtin_vec_sll (vsc, vuc); @@ -3459,6 +3471,10 @@ VSL VSL_VSLL vull __builtin_vec_sll (vull, vuc); VSL VSL_VULL + vsq __builtin_vec_sll (vsq, vuc); + VSL VSL_VSQ + vuq __builtin_vec_sll (vuq, vuc); + VSL VSL_VUQ ; The following variants are deprecated. vsc __builtin_vec_sll (vsc, vus); VSL VSL_VSC_VUS @@ -3554,6 +3570,14 @@ VSLO VSLO_VFS vf __builtin_vec_slo (vf, vuc); VSLO VSLO_VFU + vsq __builtin_vec_slo (vsq, vsc); + VSLO VSLDO_VSQS + vsq __builtin_vec_slo (vsq, vuc); + VSLO VSLDO_VSQU + vuq __builtin_vec_slo (vuq, vsc); + VSLO VSLDO_VUQS + vuq __builtin_vec_slo (vuq, vuc); + VSLO VSLDO_VUQU [VEC_SLV, vec_slv, __builtin_vec_vslv] vuc __builtin_vec_vslv (vuc, vuc); @@ -3699,6 +3723,10 @@ VSRDB_V2DI VSRDB_VSLL vull __builtin_vec_srdb (vull, vull, const int); VSRDB_V2DI VSRDB_VULL + vsq __builtin_vec_srdb (vsq, vsq, const int); + VSRDB_V1TI VSRDB_VSQ + vuq __builtin_vec_srdb (vuq, vuq, const int); + VSRDB_V1TI VSRDB_VUQ [VEC_SRL, vec_srl, __builtin_vec_srl] vsc __builtin_vec_srl (vsc, vuc); @@ -3719,6 +3747,10 @@ VSR VSR_VSLL vull __builtin_vec_srl (vull, vuc); VSR VSR_VULL + vsq __builtin_vec_srl (vsq, vuc); + VSR VSR_VSQ + vuq __builtin_vec_srl (vuq, vuc); + VSR VSR_VUQ ; The following variants are deprecated. vsc __builtin_vec_srl (vsc, vus); VSR VSR_VSC_VUS @@ -3808,6 +3840,14 @@ VSRO VSRO_VFS vf __builtin_vec_sro (vf, vuc); VSRO VSRO_VFU + vsq __builtin_vec_sro (vsq, vsc); + VSRO VSRDO_VSQS + vsq __builtin_vec_sro (vsq, vuc); + VSRO VSRDO_VSQU + vuq __builtin_vec_sro (vuq, vsc); + VSRO VSRDO_VUQS + vuq __builtin_vec_sro (vuq, vuc); + VSRO VSRDO_VUQU [VEC_SRV, vec_srv, __builtin_vec_vsrv] vuc __builtin_vec_vsrv (vuc, vuc); diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 0b572afca72..5125a6d9def 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -23504,6 +23504,10 @@ const unsigned int); vector signed long long, const unsigned int); @exdent vector unsigned long long vec_sldb (vector unsigned long long, vector unsigned long long, const unsigned int); +@exdent vector signed __int128 vec_sldb (vector signed __int128, +vector signed __int128, const unsigned int); +@exdent vector unsigned __int128 vec_sldb (vector unsigned __int128, +vector unsigned __int128, const unsigned int); @end smallexample Shift the combined input vectors left by the amount specified by the low-order @@ -23531,6 +23535,10 @@ const unsigned int); vector signed long long, const unsigned int); @exdent vector unsigned long long vec_srdb (vector unsigned long long, vector unsigned long long, const unsigned int); +@exdent vector signed __int128 vec_srdb (vector signed __int128, +vector signed __int128, const unsigned int); +@exdent vector unsigned __int128 vec_srdb (vector unsigned __int128, +vector unsigned __int128, const unsigned int); @end smallexample Shift the combined input vectors right by the amount specified by the low-order @@ -24025,6 +24033,40 @@ int vec_any_le (vector signed __int128, vector signed __int128); int vec_any_le (vector unsigned __int128, vector unsigned __int128); @end smallexample +@smallexample +@exdent vector signed __int128 vec_sld (vector signed __int128, +vector signed __int128, const unsigned int); +@exdent vector unsigned __int128 vec_sld (vector unsigned __int128, +vector unsigned __int128, const unsigned int); +@exdent vector signed __int128 vec_sldw (vector signed __int128, +vector signed __int128, const unsigned int); +@exdent vector unsigned __int128 vec_sldw (vector unsigned __int, +vector unsigned __int128, const unsigned int); +@exdent vector signed __int128 vec_slo (vector signed __int128, +vector signed char); +@exdent vector signed __int128 vec_slo (vector signed __int128, +vector unsigned char); +@exdent vector unsigned __int128 vec_slo (vector unsigned __int128, +vector signed char); +@exdent vector unsigned __int128 vec_slo (vector unsigned __int128, +vector unsigned char); +@exdent vector signed __int128 vec_sro (vector signed __int128, +vector signed char); +@exdent vector signed __int128 vec_sro (vector signed __int128, +vector unsigned char); +@exdent vector unsigned __int128 vec_sro (vector unsigned __int128, +vector signed char); +@exdent vector unsigned __int128 vec_sro (vector unsigned __int128, +vector unsigned char); +@exdent vector signed __int128 vec_srl (vector signed __int128, +vector unsigned char); +@exdent vector unsigned __int128 vec_srl (vector unsigned __int128, +vector unsigned char); +@end smallexample + +The above instances are extension of the exiting overloaded built-ins +@code{vec_sld}, @code{vec_sldw}, @code{vec_slo}, @code{vec_sro}, @code{vec_srl} +that are documented in the PVIPR. @node PowerPC Hardware Transactional Memory Built-in Functions @subsection PowerPC Hardware Transactional Memory Built-in Functions diff --git a/gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c b/gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c new file mode 100644 index 00000000000..bb90f489149 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c @@ -0,0 +1,349 @@ +/* { dg-do run { target { int128 } && { power10_hw } } } */ +/* { dg-do link { target { ! power10_hw } } } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -save-temps" } */ + +#include <altivec.h> + +#define DEBUG 0 + +#if DEBUG +#include <stdio.h> + +void print_i128 (unsigned __int128 val) +{ + printf(" 0x%016llx%016llx", + (unsigned long long)(val >> 64), + (unsigned long long)(val & 0xFFFFFFFFFFFFFFFF)); +} +#endif + +extern void abort (void); + +#if DEBUG +#define ACTION_2ARG_UNSIGNED(NAME, TYPE_NAME) \ + printf ("vec_%s (vector TYPE __int128, vector TYPE __int128) \n", #NAME); \ + printf(" src_va_s128[0] = "); \ + print_i128 ((unsigned __int128) src_va_##TYPE_NAME[0]); \ + printf("\n"); \ + printf(" src_vb_uc = 0x"); \ + for (i = 0; i < 16; i++) \ + printf("%02x", src_vb_uc[i]); \ + printf("\n"); \ + printf(" vresult[0] = "); \ + print_i128 ((unsigned __int128) vresult[0]); \ + printf("\n"); \ + printf(" expected_vresult[0] = "); \ + print_i128 ((unsigned __int128) expected_vresult[0]); \ + printf("\n"); + +#define ACTION_2ARG_SIGNED(NAME, TYPE_NAME) \ + printf ("vec_%s (vector TYPE __int128, vector TYPE __int128) \n", #NAME); \ + printf(" src_va_s128[0] = "); \ + print_i128 ((unsigned __int128) src_va_##TYPE_NAME[0]); \ + printf("\n"); \ + printf(" src_vb_sc = 0x"); \ + for (i = 0; i < 16; i++) \ + printf("%02x", src_vb_sc[i]); \ + printf("\n"); \ + printf(" vresult[0] = "); \ + print_i128 ((unsigned __int128) vresult[0]); \ + printf("\n"); \ + printf(" expected_vresult[0] = "); \ + print_i128 ((unsigned __int128) expected_vresult[0]); \ + printf("\n"); + +#define ACTION_3ARG(NAME, TYPE_NAME, CONST) \ + printf ("vec_%s (vector TYPE __int128, vector TYPE __int128, %s) \n", \ + #NAME, #CONST); \ + printf(" src_va_s128[0] = "); \ + print_i128 ((unsigned __int128) src_va_##TYPE_NAME[0]); \ + printf("\n"); \ + printf(" src_vb_s128[0] = "); \ + print_i128 ((unsigned __int128) src_vb_##TYPE_NAME[0]); \ + printf("\n"); \ + printf(" vresult[0] = "); \ + print_i128 ((unsigned __int128) vresult[0]); \ + printf("\n"); \ + printf(" expected_vresult[0] = "); \ + print_i128 ((unsigned __int128) expected_vresult[0]); \ + printf("\n"); +#else +#define ACTION_2ARG(NAME, TYPE_NAME) \ + abort(); + +#define ACTION_3ARG(NAME, TYPE_NAME, CONST) \ + abort(); +#endif + +/* Second argument of the builtin is vector unsigned char. */ +#define TEST_2ARG_UNSIGNED(NAME, TYPE, TYPE_NAME, EXP_RESULT_HI, EXP_RESULT_LO) \ + { \ + vector TYPE __int128 vresult; \ + vector TYPE __int128 expected_vresult; \ + int i; \ + \ + expected_vresult = (vector TYPE __int128) { EXP_RESULT_HI }; \ + expected_vresult = (expected_vresult << 64) | \ + (vector TYPE __int128) { EXP_RESULT_LO }; \ + vresult = vec_##NAME (src_va_##TYPE_NAME, src_vb_uc); \ + \ + if (!vec_all_eq (vresult, expected_vresult)) { \ + ACTION_2ARG_UNSIGNED(NAME, TYPE_NAME) \ + } \ + } + +/* Second argument of the builtin is vector signed char. */ +#define TEST_2ARG_SIGNED(NAME, TYPE, TYPE_NAME, EXP_RESULT_HI, EXP_RESULT_LO) \ + { \ + vector TYPE __int128 vresult; \ + vector TYPE __int128 expected_vresult; \ + int i; \ + \ + expected_vresult = (vector TYPE __int128) { EXP_RESULT_HI }; \ + expected_vresult = (expected_vresult << 64) | \ + (vector TYPE __int128) { EXP_RESULT_LO }; \ + vresult = vec_##NAME (src_va_##TYPE_NAME, src_vb_sc); \ + \ + if (!vec_all_eq (vresult, expected_vresult)) { \ + ACTION_2ARG_SIGNED(NAME, TYPE_NAME) \ + } \ + } + +#define TEST_3ARG(NAME, TYPE, TYPE_NAME, CONST, EXP_RESULT_HI, EXP_RESULT_LO) \ + { \ + vector TYPE __int128 vresult; \ + vector TYPE __int128 expected_vresult; \ + \ + expected_vresult = (vector TYPE __int128) { EXP_RESULT_HI }; \ + expected_vresult = (expected_vresult << 64) | \ + (vector TYPE __int128) { EXP_RESULT_LO }; \ + vresult = vec_##NAME (src_va_##TYPE_NAME, src_vb_##TYPE_NAME, CONST); \ + \ + if (!vec_all_eq (vresult, expected_vresult)) { \ + ACTION_3ARG(NAME, TYPE_NAME, CONST) \ + } \ + } + +int +main (int argc, char *argv []) +{ + vector signed __int128 vresult_s128; + vector signed __int128 expected_vresult_s128; + vector signed __int128 src_va_s128; + vector signed __int128 src_vb_s128; + vector unsigned __int128 vresult_u128; + vector unsigned __int128 expected_vresult_u128; + vector unsigned __int128 src_va_u128; + vector unsigned __int128 src_vb_u128; + vector signed char src_vb_sc; + vector unsigned char src_vb_uc; + + /* 128-bit vector shift right tests, vec_srdb. */ + src_va_s128 = (vector signed __int128) {0x12345678}; + src_vb_s128 = (vector signed __int128) {0xFEDCBA90}; + TEST_3ARG(srdb, signed, s128, 4, 0x8000000000000000, 0xFEDCBA9) + + src_va_u128 = (vector unsigned __int128) { 0xFEDCBA98 }; + src_vb_u128 = (vector unsigned __int128) { 0x76543210}; + TEST_3ARG(srdb, unsigned, u128, 4, 0x8000000000000000, 0x07654321) + + /* 128-bit vector shift left tests, vec_sldb. */ + src_va_s128 = (vector signed __int128) {0x123456789ABCDEF0}; + src_va_s128 = (src_va_s128 << 64) + | (vector signed __int128) {0x123456789ABCDEF0}; + src_vb_s128 = (vector signed __int128) {0xFEDCBA9876543210}; + src_vb_s128 = (src_vb_s128 << 64) + | (vector signed __int128) {0xFEDCBA9876543210}; + TEST_3ARG(sldb, signed, s128, 4, 0x23456789ABCDEF01, 0x23456789ABCDEF0F) + + src_va_u128 = (vector unsigned __int128) {0xFEDCBA9876543210}; + src_va_u128 = src_va_u128 << 64 + | (vector unsigned __int128) {0xFEDCBA9876543210}; + src_vb_u128 = (vector unsigned __int128) {0x123456789ABCDEF0}; + src_vb_u128 = src_vb_u128 << 64 + | (vector unsigned __int128) {0x123456789ABCDEF0}; + TEST_3ARG(sldb, unsigned, u128, 4, 0xEDCBA9876543210F, 0xEDCBA98765432101) + + /* Shift left by octect tests, vec_sld. Shift is by immediate value + times 8. */ + src_va_s128 = (vector signed __int128) {0x123456789ABCDEF0}; + src_va_s128 = (src_va_s128 << 64) + | (vector signed __int128) {0x123456789ABCDEF0}; + src_vb_s128 = (vector signed __int128) {0xFEDCBA9876543210}; + src_vb_s128 = (src_vb_s128 << 64) + | (vector signed __int128) {0xFEDCBA9876543210}; + TEST_3ARG(sld, signed, s128, 4, 0x9abcdef012345678, 0x9abcdef0fedcba98) + + src_va_u128 = (vector unsigned __int128) {0xFEDCBA9876543210}; + src_va_u128 = src_va_u128 << 64 + | (vector unsigned __int128) {0xFEDCBA9876543210}; + src_vb_u128 = (vector unsigned __int128) {0x123456789ABCDEF0}; + src_vb_u128 = src_vb_u128 << 64 + | (vector unsigned __int128) {0x123456789ABCDEF0}; + TEST_3ARG(sld, unsigned, u128, 4, 0x76543210fedcba98, 0x7654321012345678) + + /* Vector left shift bytes within the vector, vec_sll. */ + src_va_s128 = (vector signed __int128) {0x123456789ABCDEF0}; + src_va_s128 = (src_va_s128 << 64) + | (vector signed __int128) {0x123456789ABCDEF0}; + src_vb_uc = (vector unsigned char) {0x01, 0x01, 0x01, 0x01, + 0x01, 0x01, 0x01, 0x01, + 0x01, 0x01, 0x01, 0x01, + 0x01, 0x01, 0x01, 0x01}; + TEST_2ARG_UNSIGNED(sll, signed, s128, 0x2468acf13579bde0, + 0x2468acf13579bde0) + + src_va_u128 = (vector unsigned __int128) {0x123456789ABCDEF0}; + src_va_u128 = src_va_u128 << 64 + | (vector unsigned __int128) {0x123456789ABCDEF0}; + src_vb_uc = (vector unsigned char) {0x02, 0x02, 0x02, 0x02, + 0x02, 0x02, 0x02, 0x02, + 0x02, 0x02, 0x02, 0x02, + 0x02, 0x02, 0x02, 0x02}; + TEST_2ARG_UNSIGNED(sll, unsigned, u128, 0x48d159e26af37bc0, + 0x48d159e26af37bc0) + + /* Vector right shift bytes within the vector, vec_srl. */ + src_va_s128 = (vector signed __int128) {0x123456789ABCDEF0}; + src_va_s128 = (src_va_s128 << 64) + | (vector signed __int128) {0x123456789ABCDEF0}; + src_vb_uc = (vector unsigned char) {0x01, 0x01, 0x01, 0x01, + 0x01, 0x01, 0x01, 0x01, + 0x01, 0x01, 0x01, 0x01, + 0x01, 0x01, 0x01, 0x01}; + TEST_2ARG_UNSIGNED(srl, signed, s128, 0x091a2b3c4d5e6f78, + 0x091a2b3c4d5e6f78) + + src_va_u128 = (vector unsigned __int128) {0x123456789ABCDEF0}; + src_va_u128 = src_va_u128 << 64 + | (vector unsigned __int128) {0x123456789ABCDEF0}; + src_vb_uc = (vector unsigned char) {0x02, 0x02, 0x02, 0x02, + 0x02, 0x02, 0x02, 0x02, + 0x02, 0x02, 0x02, 0x02, + 0x02, 0x02, 0x02, 0x02}; + TEST_2ARG_UNSIGNED(srl, unsigned, u128, 0x48d159e26af37bc, + 0x48d159e26af37bc) + + /* Shift left by octect tests, vec_slo. Shift is by immediate value + bytes. Shift amount in bits 121:124. */ + src_va_s128 = (vector signed __int128) {0x123456789ABCDEF0}; + src_va_s128 = (src_va_s128 << 64) + | (vector signed __int128) {0x123456789ABCDEF0}; + /* Note vb_sc is Endian specific, this is just LE. */ + /* The left shift amount is 1 byte, i.e. 1 * 8 bits. */ + src_vb_sc = (vector signed char) {0x1 << 3, 0x0, 0x0, 0x0, + 0x0, 0x0, 0x0, 0x0, + 0x0, 0x0, 0x0, 0x0, + 0x0, 0x0, 0x0, 0x0}; + + TEST_2ARG_SIGNED(slo, signed, s128, 0x3456789ABCDEF012, + 0x3456789ABCDEF000) + src_va_s128 = (vector signed __int128) {0x123456789ABCDEF0}; + src_va_s128 = (src_va_s128 << 64) + | (vector signed __int128) {0x123456789ABCDEF0}; + /* Note vb_sc is Endian specific, this is just LE. */ + /* The left shift amount is 2 bytes, i.e. 2 * 8 bits. */ + src_vb_uc = (vector unsigned char) {0x2 << 3, 0x0, 0x0, 0x0, + 0x0, 0x0, 0x0, 0x0, + 0x0, 0x0, 0x0, 0x0, + 0x0, 0x0, 0x0, 0x0}; + TEST_2ARG_UNSIGNED(slo, signed, s128, 0x56789ABCDEF01234, + 0x56789ABCDEF00000) + + src_va_u128 = (vector unsigned __int128) {0xFEDCBA9876543210}; + src_va_u128 = src_va_u128 << 64 + | (vector unsigned __int128) {0xFEDCBA9876543210}; + /* The left shift amount is 3 bytes, i.e. 3 * 8 bits. */ + src_vb_sc = (vector signed char) {0x03<<3, 0x0, 0x0, 0x0, + 0x0, 0x0, 0x0, 0x0, + 0x0, 0x0, 0x0, 0x0, + 0x00, 0x00, 0x00, 0x0}; + TEST_2ARG_SIGNED(slo, unsigned, u128, 0x9876543210FEDCBA, + 0x9876543210000000) + + src_va_u128 = (vector unsigned __int128) {0xFEDCBA9876543210}; + src_va_u128 = src_va_u128 << 64 + | (vector unsigned __int128) {0xFEDCBA9876543210}; + /* The left shift amount is 4 bytes, i.e. 4 * 8 bits. */ + src_vb_uc = (vector unsigned char) {0x04<<3, 0x0, 0x0, 0x0, + 0x0, 0x0, 0x0, 0x0, + 0x0, 0x0, 0x0, 0x0, + 0x00, 0x00, 0x00, 0x0}; + TEST_2ARG_UNSIGNED(slo, unsigned, u128, 0x76543210FEDCBA98, + 0x7654321000000000) + + /* Shift right by octect tests, vec_sro. Shift is by immediate value + times 8. Shift amount in bits 121:124. */ + src_va_s128 = (vector signed __int128) {0x123456789ABCDEF0}; + src_va_s128 = (src_va_s128 << 64) + | (vector signed __int128) {0x123456789ABCDEF0}; + /* Note vb_sc is Endian specific, this is just LE. */ + /* The left shift amount is 1 byte, i.e. 1 * 8 bits. */ + src_vb_sc = (vector signed char) {0x1 << 3, 0x0, 0x0, 0x0, + 0x0, 0x0, 0x0, 0x0, + 0x0, 0x0, 0x0, 0x0, + 0x0, 0x0, 0x0, 0x0}; + TEST_2ARG_SIGNED(sro, signed, s128, 0x00123456789ABCDE, 0xF0123456789ABCDE) + + src_va_s128 = (vector signed __int128) {0x123456789ABCDEF0}; + src_va_s128 = (src_va_s128 << 64) + | (vector signed __int128) {0x123456789ABCDEF0}; + /* Note vb_sc is Endian specific, this is just LE. */ + /* The left shift amount is 1 byte, i.e. 1 * 8 bits. */ + src_vb_uc = (vector unsigned char) {0x2 << 3, 0x0, 0x0, 0x0, + 0x0, 0x0, 0x0, 0x0, + 0x0, 0x0, 0x0, 0x0, + 0x0, 0x0, 0x0, 0x0}; + TEST_2ARG_UNSIGNED(sro, signed, s128, 0x0000123456789ABC, + 0xDEF0123456789ABC) + + src_va_u128 = (vector unsigned __int128) {0xFEDCBA9876543210}; + src_va_u128 = src_va_u128 << 64 + | (vector unsigned __int128) {0xFEDCBA9876543210}; + /* The left shift amount is 4 bytes, i.e. 4 * 8 bits. */ + src_vb_sc = (vector signed char) {0x03<<3, 0x0, 0x0, 0x0, + 0x0, 0x0, 0x0, 0x0, + 0x0, 0x0, 0x0, 0x0, + 0x00, 0x00, 0x00, 0x0}; + TEST_2ARG_SIGNED(sro, unsigned, u128, 0x000000FEDCBA9876, + 0x543210FEDCBA9876) + + src_va_u128 = (vector unsigned __int128) {0xFEDCBA9876543210}; + src_va_u128 = src_va_u128 << 64 + | (vector unsigned __int128) {0xFEDCBA9876543210}; + /* The left shift amount is 4 bytes, i.e. 4 * 8 bits. */ + src_vb_uc = (vector unsigned char) {0x04<<3, 0x0, 0x0, 0x0, + 0x0, 0x0, 0x0, 0x0, + 0x0, 0x0, 0x0, 0x0, + 0x00, 0x00, 0x00, 0x0}; + TEST_2ARG_UNSIGNED(sro, unsigned, u128, 0x00000000FEDCBA98, + 0x76543210FEDCBA98) + + /* 128-bit vector shift left tests, vec_sldw. */ + src_va_s128 = (vector signed __int128) {0x123456789ABCDEF0}; + src_va_s128 = (src_va_s128 << 64) + | (vector signed __int128) {0x123456789ABCDEF0}; + src_vb_s128 = (vector signed __int128) {0xFEDCBA9876543210}; + src_vb_s128 = (src_vb_s128 << 64) + | (vector signed __int128) {0xFEDCBA9876543210}; + TEST_3ARG(sldw, signed, s128, 1, 0x9ABCDEF012345678, 0x9ABCDEF0FEDCBA98) + + src_va_u128 = (vector unsigned __int128) {0x123456789ABCDEF0}; + src_va_u128 = (src_va_u128 << 64) + | (vector unsigned __int128) {0x123456789ABCDEF0}; + src_vb_u128 = (vector unsigned __int128) {0xFEDCBA9876543210}; + src_vb_u128 = (src_vb_u128 << 64) + | (vector unsigned __int128) {0xFEDCBA9876543210}; + TEST_3ARG(sldw, unsigned, u128, 2, 0x123456789ABCDEF0, 0xFEDCBA9876543210) +