diff mbox series

rs6000, Add new overloaded vector shift builtin int128, varients

Message ID 163c9a9d-6755-4fbc-9c5a-82475fd1d4b1@linux.ibm.com
State New
Headers show
Series rs6000, Add new overloaded vector shift builtin int128, varients | expand

Commit Message

Carl Love July 19, 2024, 8:04 p.m. UTC
GCC developers:

The following patch adds the int128 varients to the existing overloaded 
built-ins vec_sld, vec_sldb, vec_sldw, vec_sll, vec_slo, vec_srdb, 
vec_srl, vec_sro.  These varients were requested by Steve Munroe.

The patch has been tested on a Power 10 system with no regressions.

Please let me know if the patch is acceptable for mainline.

                                    Carl


-------------------------------------------------------------------------------------------------------
  rs6000, Add new overloaded vector shift builtin int128 varients

Add the signed __int128 and unsigned __int128 argument types for the
overloaded built-ins vec_sld, vec_sldb, vec_sldw, vec_sll, vec_slo,
vec_srdb, vec_srl, vec_sro.  For each of the new argument types add a
testcase and update the documentation for the built-in.

Add the missing internal names for the float and double types for
overloaded builtin vec_sld for the float and double types.

gcc/ChangeLog:
     * config/rs6000/altivec.md (vs<SLDB_lr>db_<mode>): Change
     define_insn iterator to VEC_IC.
     * config/rs6000/rs6000-builtins.def (__builtin_altivec_vsldoi_v1ti,
     __builtin_vsx_xxsldwi_v1ti, __builtin_altivec_vsldb_v1ti,
     __builtin_altivec_vsrdb_v1ti): New builtin definitions.
     * config/rs6000/rs6000-overload.def (vec_sld, vec_sldb, vec_sldw,
     vec_sll, vec_slo, vec_srdb, vec_srl, vec_sro): New overloaded
     definitions.
     (vec_sld): Add missing internal names.
     * doc/extend.texi (vec_sld, vec_sldb, vec_sldw,    vec_sll, vec_slo,
     vec_srdb, vec_srl, vec_sro): Add documentation for new overloaded
     built-ins.

gcc/testsuite/ChangeLog:
     * gcc.target/powerpc/vec-shift-double-runnable-int128.c: New test
     file.
---
  gcc/config/rs6000/altivec.md                  |   6 +-
  gcc/config/rs6000/rs6000-builtins.def         |  12 +
  gcc/config/rs6000/rs6000-overload.def         |  44 ++-
  gcc/doc/extend.texi                           |  42 +++
  .../vec-shift-double-runnable-int128.c        | 349 ++++++++++++++++++
  5 files changed, 448 insertions(+), 5 deletions(-)
  create mode 100644 
gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c

+
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times {\mvsrdbi\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvsldbi\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvsl\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvsr\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvslo\M} 4 } } */
+/* { dg-final { scan-assembler-times {\mvsro\M} 4 } } */

Comments

Peter Bergner July 23, 2024, 9:26 p.m. UTC | #1
On 7/19/24 3:04 PM, Carl Love wrote:
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index 5af9bf920a2..2a18ee44526 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -878,9 +878,9 @@ (define_int_attr SLDB_lr [(UNSPEC_SLDB "l")
>  (define_int_iterator VSHIFT_DBL_LR [UNSPEC_SLDB UNSPEC_SRDB])
> 
>  (define_insn "vs<SLDB_lr>db_<mode>"
> - [(set (match_operand:VI2 0 "register_operand" "=v")
> -  (unspec:VI2 [(match_operand:VI2 1 "register_operand" "v")
> -           (match_operand:VI2 2 "register_operand" "v")
> + [(set (match_operand:VEC_IC 0 "register_operand" "=v")
> +  (unspec:VEC_IC [(match_operand:VEC_IC 1 "register_operand" "v")
> +           (match_operand:VEC_IC 2 "register_operand" "v")
>             (match_operand:QI 3 "const_0_to_12_operand" "n")]
>            VSHIFT_DBL_LR))]
>    "TARGET_POWER10"

I know the old code used the register_operand predicate for the vector
operands, but those really should be changed to altivec_register_operand.

Peter
Carl Love July 24, 2024, 3:44 p.m. UTC | #2
Peter:

On 7/23/24 2:26 PM, Peter Bergner wrote:
> On 7/19/24 3:04 PM, Carl Love wrote:
>> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
>> index 5af9bf920a2..2a18ee44526 100644
>> --- a/gcc/config/rs6000/altivec.md
>> +++ b/gcc/config/rs6000/altivec.md
>> @@ -878,9 +878,9 @@ (define_int_attr SLDB_lr [(UNSPEC_SLDB "l")
>>   (define_int_iterator VSHIFT_DBL_LR [UNSPEC_SLDB UNSPEC_SRDB])
>>
>>   (define_insn "vs<SLDB_lr>db_<mode>"
>> - [(set (match_operand:VI2 0 "register_operand" "=v")
>> -  (unspec:VI2 [(match_operand:VI2 1 "register_operand" "v")
>> -           (match_operand:VI2 2 "register_operand" "v")
>> + [(set (match_operand:VEC_IC 0 "register_operand" "=v")
>> +  (unspec:VEC_IC [(match_operand:VEC_IC 1 "register_operand" "v")
>> +           (match_operand:VEC_IC 2 "register_operand" "v")
>>              (match_operand:QI 3 "const_0_to_12_operand" "n")]
>>             VSHIFT_DBL_LR))]
>>     "TARGET_POWER10"
> I know the old code used the register_operand predicate for the vector
> operands, but those really should be changed to altivec_register_operand.
>
> Peter

OK, will add that change and retest the patch.  Thanks.

                                              Carl
Segher Boessenkool July 24, 2024, 5:03 p.m. UTC | #3
Hi!

So much manual stuff needed, sigh.

On Fri, Jul 19, 2024 at 01:04:12PM -0700, Carl Love wrote:
> gcc/ChangeLog:
>     * config/rs6000/altivec.md (vs<SLDB_lr>db_<mode>): Change
>     define_insn iterator to VEC_IC.

From VI2 (a nothing-saying name) to VEC_IC (also a nonsensical name).

Maybe VEC_IC should have a comment explaining the TARGET_POWER10 thing
at least?  Just something like "ISA 3.1 added 128-bit things" or
whatever, but don't leave the reader second-guessing, a reader will
often guess wrong :-)

> gcc/testsuite/ChangeLog:
>     * gcc.target/powerpc/vec-shift-double-runnable-int128.c: New test
>     file.

Please don't line-wrap where not wanted.  Changelog lines are 80
character positions wide.  (Or 79 if you want, but heh).

> +The above instances are extension of the exiting overloaded built-ins

(existing)

> a/gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c 
> b/gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c
> new file mode 100644
> index 00000000000..bb90f489149
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c
> @@ -0,0 +1,349 @@
> +/* { dg-do run  { target { int128 } && { power10_hw } } } */

Everything power10 is int128 always.

> +/* { dg-do link { target { ! power10_hw } } } */
> +/* { dg-require-effective-target power10_ok } */

So this is enough always.

Often we have two testcases, one for run, one for compiling only.  It's
a bit simpler and cleaner.

> +/* { dg-options "-mdejagnu-cpu=power10 -save-temps" } */

Why the -save-temps?  Always document it if you want that for something,
but never put it in the testcase if not.  A leftover from development?

Okay for trunk, thank you!  Well Peter had some comments too, modulo
those I guess, I'll read them now ;-)


Segher
Segher Boessenkool July 24, 2024, 5:06 p.m. UTC | #4
On Tue, Jul 23, 2024 at 04:26:43PM -0500, Peter Bergner wrote:
> On 7/19/24 3:04 PM, Carl Love wrote:
> > diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> > index 5af9bf920a2..2a18ee44526 100644
> > --- a/gcc/config/rs6000/altivec.md
> > +++ b/gcc/config/rs6000/altivec.md
> > @@ -878,9 +878,9 @@ (define_int_attr SLDB_lr [(UNSPEC_SLDB "l")
> >  (define_int_iterator VSHIFT_DBL_LR [UNSPEC_SLDB UNSPEC_SRDB])
> > 
> >  (define_insn "vs<SLDB_lr>db_<mode>"
> > - [(set (match_operand:VI2 0 "register_operand" "=v")
> > -  (unspec:VI2 [(match_operand:VI2 1 "register_operand" "v")
> > -           (match_operand:VI2 2 "register_operand" "v")
> > + [(set (match_operand:VEC_IC 0 "register_operand" "=v")
> > +  (unspec:VEC_IC [(match_operand:VEC_IC 1 "register_operand" "v")
> > +           (match_operand:VEC_IC 2 "register_operand" "v")
> >             (match_operand:QI 3 "const_0_to_12_operand" "n")]
> >            VSHIFT_DBL_LR))]
> >    "TARGET_POWER10"
> 
> I know the old code used the register_operand predicate for the vector
> operands, but those really should be changed to altivec_register_operand.

register_operand is just fine usually.  The "v" constraint already makes
sure things end up in a VMX (a lower VSX) register, the predicate
doesn't help here.  register_operand is shorter (and thus, preferred),
and also more likely correct if the code changes later :-)


Segher
Peter Bergner July 24, 2024, 5:12 p.m. UTC | #5
On 7/24/24 12:06 PM, Segher Boessenkool wrote:
> On Tue, Jul 23, 2024 at 04:26:43PM -0500, Peter Bergner wrote:
>> On 7/19/24 3:04 PM, Carl Love wrote:
>>>  (define_insn "vs<SLDB_lr>db_<mode>"
>>> - [(set (match_operand:VI2 0 "register_operand" "=v")
>>> -  (unspec:VI2 [(match_operand:VI2 1 "register_operand" "v")
>>> -           (match_operand:VI2 2 "register_operand" "v")
>>> + [(set (match_operand:VEC_IC 0 "register_operand" "=v")
>>> +  (unspec:VEC_IC [(match_operand:VEC_IC 1 "register_operand" "v")
>>> +           (match_operand:VEC_IC 2 "register_operand" "v")
>>>             (match_operand:QI 3 "const_0_to_12_operand" "n")]
>>>            VSHIFT_DBL_LR))]
>>>    "TARGET_POWER10"
>>
>> I know the old code used the register_operand predicate for the vector
>> operands, but those really should be changed to altivec_register_operand.
> 
> register_operand is just fine usually.  The "v" constraint already makes
> sure things end up in a VMX (a lower VSX) register, the predicate
> doesn't help here.  register_operand is shorter (and thus, preferred),
> and also more likely correct if the code changes later :-)

I thought we always wanted the predicate to match the constraint being used?

Peter
Peter Bergner July 24, 2024, 5:16 p.m. UTC | #6
On 7/24/24 12:03 PM, Segher Boessenkool wrote:
>> +/* { dg-options "-mdejagnu-cpu=power10 -save-temps" } */
> 
> Why the -save-temps?  Always document it if you want that for something,
> but never put it in the testcase if not.  A leftover from development?

I can answer this one. :-).  Since these are dg-do run/link tests,
we need the -save-temps to keep the assembler files around for Carl's
scan-assembler tests at the end of the test case.

Peter
Segher Boessenkool July 24, 2024, 6:30 p.m. UTC | #7
On Wed, Jul 24, 2024 at 12:12:05PM -0500, Peter Bergner wrote:
> On 7/24/24 12:06 PM, Segher Boessenkool wrote:
> I thought we always wanted the predicate to match the constraint being used?

Predicates and constraints have different purposes, and are used at
different times (typically).  Everything before RA is predicates
only, and RA and everything after it use constraints (as well).

register_operand says it has to be a register.  It allows any
pseudo-register, so before RA, there is no real difference between
register_operand and altivec_register_operand (which allows all pseudos
as well)..

The constraint should not demand things that weren't clear earlier,
because that will then cause reloading eventually, often with less
efficient code.  It still will *work* though.

But that is not the case here :-)


Segher
Segher Boessenkool July 24, 2024, 6:31 p.m. UTC | #8
On Wed, Jul 24, 2024 at 12:16:33PM -0500, Peter Bergner wrote:
> 
> On 7/24/24 12:03 PM, Segher Boessenkool wrote:
> >> +/* { dg-options "-mdejagnu-cpu=power10 -save-temps" } */
> > 
> > Why the -save-temps?  Always document it if you want that for something,
> > but never put it in the testcase if not.  A leftover from development?
> 
> I can answer this one. :-).  Since these are dg-do run/link tests,
> we need the -save-temps to keep the assembler files around for Carl's
> scan-assembler tests at the end of the test case.

Ah!  Gotcha.  Please add a comment then?  Just something trivial, a
short line is enough :-)


Segher
Carl Love July 24, 2024, 6:38 p.m. UTC | #9
Segher:

Thanks for the review, a few questions...

On 7/24/24 10:03 AM, Segher Boessenkool wrote:
> Hi!
>
> So much manual stuff needed, sigh.
>
> On Fri, Jul 19, 2024 at 01:04:12PM -0700, Carl Love wrote:
>> gcc/ChangeLog:
>>      * config/rs6000/altivec.md (vs<SLDB_lr>db_<mode>): Change
>>      define_insn iterator to VEC_IC.
>  From VI2 (a nothing-saying name) to VEC_IC (also a nonsensical name).
>
> Maybe VEC_IC should have a comment explaining the TARGET_POWER10 thing
> at least?  Just something like "ISA 3.1 added 128-bit things" or
> whatever, but don't leave the reader second-guessing, a reader will
> often guess wrong :-)

I don't disagree that the reader will guess wrong, probably after being 
frustated that it isn't obvious.  :-)
The VEC_IC was an existing definition, this patch does not add it.  Your 
comments seems to imply you want a comment on the definition for VEC_IC 
in vector.md?  I could add one to the existing definition if you like 
but it seems outside the scope of this patch.

The change log entry could be improved to say "Change define_insn 
iterator to VEC_IC which included the V1TI type added in ISA 3.1." Would 
that address your concerns?
>
>> gcc/testsuite/ChangeLog:
>>      * gcc.target/powerpc/vec-shift-double-runnable-int128.c: New test
>>      file.
> Please don't line-wrap where not wanted.  Changelog lines are 80
> character positions wide.  (Or 79 if you want, but heh).

Yea, it does look like file will just fit on the same line.  Fixed.
>
>> +The above instances are extension of the exiting overloaded built-ins
> (existing)
Fixed spelling error.

>
>> a/gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c
>> b/gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c
>> new file mode 100644
>> index 00000000000..bb90f489149
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c
>> @@ -0,0 +1,349 @@
>> +/* { dg-do run  { target { int128 } && { power10_hw } } } */
> Everything power10 is int128 always.

OK, so don't need the power10_hw.  Changed to just int128 for the target:

     /* { dg-do run  target int128  } */
>> +/* { dg-do link { target { ! power10_hw } } } */
>> +/* { dg-require-effective-target power10_ok } */
> So this is enough always.
>
> Often we have two testcases, one for run, one for compiling only.  It's
> a bit simpler and cleaner.

Sounds like you would prefer to have a run and a compile test file? I 
will create a new file vec-shift-double-int128.c  consisting of a series 
of functions to test each built-in definition.
>
>> +/* { dg-options "-mdejagnu-cpu=power10 -save-temps" } */
> Why the -save-temps?  Always document it if you want that for something,
> but never put it in the testcase if not.  A leftover from development?
>
> Okay for trunk, thank you!  Well Peter had some comments too, modulo
> those I guess, I'll read them now ;-)
So as Peter said, the save-temps is because the runnable case file also 
checks for assembler times at the end of the file.

I will move the scan-assembler-times checks to the new compile only test.

                                         Carl
Segher Boessenkool July 24, 2024, 6:47 p.m. UTC | #10
Hi!

On Wed, Jul 24, 2024 at 11:38:11AM -0700, Carl Love wrote:
> On 7/24/24 10:03 AM, Segher Boessenkool wrote:
> >So much manual stuff needed, sigh.
> >
> >On Fri, Jul 19, 2024 at 01:04:12PM -0700, Carl Love wrote:
> >>gcc/ChangeLog:
> >>     * config/rs6000/altivec.md (vs<SLDB_lr>db_<mode>): Change
> >>     define_insn iterator to VEC_IC.
> > From VI2 (a nothing-saying name) to VEC_IC (also a nonsensical name).
> >
> >Maybe VEC_IC should have a comment explaining the TARGET_POWER10 thing
> >at least?  Just something like "ISA 3.1 added 128-bit things" or
> >whatever, but don't leave the reader second-guessing, a reader will
> >often guess wrong :-)
> 
> I don't disagree that the reader will guess wrong, probably after being 
> frustated that it isn't obvious.  :-)
> The VEC_IC was an existing definition, this patch does not add it.  Your 
> comments seems to imply you want a comment on the definition for VEC_IC 
> in vector.md?  I could add one to the existing definition if you like 
> but it seems outside the scope of this patch.

Yes, I'm just lamenting the state of things :-)

It would have to be a separate patch, yes.  A trivial patch to add such
a comment is pre-approved :-)

> The change log entry could be improved to say "Change define_insn 
> iterator to VEC_IC which included the V1TI type added in ISA 3.1." Would 
> that address your concerns?

The current changelog is fine.  Changelogs never can replace comments in
the code.

> >>+/* { dg-do run  { target { int128 } && { power10_hw } } } */
> >Everything power10 is int128 always.
> 
> OK, so don't need the power10_hw.  Changed to just int128 for the target:

No, the other way around: you cannot run the code on machines without
these (ISA 3.1) instructions!

But p10 always satisfies the int128 predicate.  Although, hrm, how
about -m32 :-)

>     /* { dg-do run  target int128  } */
> >>+/* { dg-do link { target { ! power10_hw } } } */
> >>+/* { dg-require-effective-target power10_ok } */
> >So this is enough always.
> >
> >Often we have two testcases, one for run, one for compiling only.  It's
> >a bit simpler and cleaner.
> 
> Sounds like you would prefer to have a run and a compile test file? I 
> will create a new file vec-shift-double-int128.c  consisting of a series 
> of functions to test each built-in definition.

No, I don't prefer that, but it is easier to handle (also for you).
That that results in a bit more files, who cares, I don't anyway :-)

> >>+/* { dg-options "-mdejagnu-cpu=power10 -save-temps" } */
> >Why the -save-temps?  Always document it if you want that for something,
> >but never put it in the testcase if not.  A leftover from development?
> >
> >Okay for trunk, thank you!  Well Peter had some comments too, modulo
> >those I guess, I'll read them now ;-)
> So as Peter said, the save-temps is because the runnable case file also 
> checks for assembler times at the end of the file.

Yup.  A comment would help :-)


Segher
Kewen.Lin July 25, 2024, 8:21 a.m. UTC | #11
Hi Carl,

Some minor comments are inlined on top of Segher's and Peter's comments.

on 2024/7/20 04:04, Carl Love wrote:
> GCC developers:
> 
> The following patch adds the int128 varients to the existing overloaded built-ins vec_sld, vec_sldb, vec_sldw, vec_sll, vec_slo, vec_srdb, vec_srl, vec_sro.  These varients were requested by Steve Munroe.
> 
> The patch has been tested on a Power 10 system with no regressions.
> 
> Please let me know if the patch is acceptable for mainline.
> 
>                                    Carl
> 
> 
> -------------------------------------------------------------------------------------------------------
>  rs6000, Add new overloaded vector shift builtin int128 varients
> 
> Add the signed __int128 and unsigned __int128 argument types for the
> overloaded built-ins vec_sld, vec_sldb, vec_sldw, vec_sll, vec_slo,
> vec_srdb, vec_srl, vec_sro.  For each of the new argument types add a
> testcase and update the documentation for the built-in.
> 
> Add the missing internal names for the float and double types for
> overloaded builtin vec_sld for the float and double types.

This isn't needed, see below explanation.

> 
> gcc/ChangeLog:
>     * config/rs6000/altivec.md (vs<SLDB_lr>db_<mode>): Change
>     define_insn iterator to VEC_IC.
>     * config/rs6000/rs6000-builtins.def (__builtin_altivec_vsldoi_v1ti,
>     __builtin_vsx_xxsldwi_v1ti, __builtin_altivec_vsldb_v1ti,
>     __builtin_altivec_vsrdb_v1ti): New builtin definitions.
>     * config/rs6000/rs6000-overload.def (vec_sld, vec_sldb, vec_sldw,
>     vec_sll, vec_slo, vec_srdb, vec_srl, vec_sro): New overloaded
>     definitions.
>     (vec_sld): Add missing internal names.
>     * doc/extend.texi (vec_sld, vec_sldb, vec_sldw,    vec_sll, vec_slo,
>     vec_srdb, vec_srl, vec_sro): Add documentation for new overloaded
>     built-ins.
> 
> gcc/testsuite/ChangeLog:
>     * gcc.target/powerpc/vec-shift-double-runnable-int128.c: New test
>     file.
> ---
>  gcc/config/rs6000/altivec.md                  |   6 +-
>  gcc/config/rs6000/rs6000-builtins.def         |  12 +
>  gcc/config/rs6000/rs6000-overload.def         |  44 ++-
>  gcc/doc/extend.texi                           |  42 +++
>  .../vec-shift-double-runnable-int128.c        | 349 ++++++++++++++++++
>  5 files changed, 448 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c
> 
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index 5af9bf920a2..2a18ee44526 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -878,9 +878,9 @@ (define_int_attr SLDB_lr [(UNSPEC_SLDB "l")
>  (define_int_iterator VSHIFT_DBL_LR [UNSPEC_SLDB UNSPEC_SRDB])
> 
>  (define_insn "vs<SLDB_lr>db_<mode>"
> - [(set (match_operand:VI2 0 "register_operand" "=v")
> -  (unspec:VI2 [(match_operand:VI2 1 "register_operand" "v")
> -           (match_operand:VI2 2 "register_operand" "v")
> + [(set (match_operand:VEC_IC 0 "register_operand" "=v")
> +  (unspec:VEC_IC [(match_operand:VEC_IC 1 "register_operand" "v")
> +           (match_operand:VEC_IC 2 "register_operand" "v")
>             (match_operand:QI 3 "const_0_to_12_operand" "n")]
>            VSHIFT_DBL_LR))]
>    "TARGET_POWER10"
> diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def
> index 77eb0f7e406..fbb6e1ddf85 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -964,6 +964,9 @@
>    const vss __builtin_altivec_vsldoi_8hi (vss, vss, const int<4>);
>      VSLDOI_8HI altivec_vsldoi_v8hi {}
> 
> +  const vsq __builtin_altivec_vsldoi_v1ti (vsq, vsq, const int<4>);
> +    VSLDOI_V1TI altivec_vsldoi_v1ti {}
> +
>    const vss __builtin_altivec_vslh (vss, vus);
>      VSLH vashlv8hi3 {}
> 
> @@ -1831,6 +1834,9 @@
>    const vsll __builtin_vsx_xxsldwi_2di (vsll, vsll, const int<2>);
>      XXSLDWI_2DI vsx_xxsldwi_v2di {}
> 
> +  const vsq __builtin_vsx_xxsldwi_v1ti (vsq, vsq, const int<2>);
> +    XXSLDWI_Q vsx_xxsldwi_v1ti {}
> +
>    const vf __builtin_vsx_xxsldwi_4sf (vf, vf, const int<2>);
>      XXSLDWI_4SF vsx_xxsldwi_v4sf {}
> 
> @@ -3299,6 +3305,9 @@
>    const vss __builtin_altivec_vsldb_v8hi (vss, vss, const int<3>);
>      VSLDB_V8HI vsldb_v8hi {}
> 
> +  const vsq __builtin_altivec_vsldb_v1ti (vsq, vsq, const int<3>);
> +    VSLDB_V1TI vsldb_v1ti {}
> +
>    const vsq __builtin_altivec_vslq (vsq, vuq);
>      VSLQ vashlv1ti3 {}
> 
> @@ -3317,6 +3326,9 @@
>    const vss __builtin_altivec_vsrdb_v8hi (vss, vss, const int<3>);
>      VSRDB_V8HI vsrdb_v8hi {}
> 
> +  const vsq __builtin_altivec_vsrdb_v1ti (vsq, vsq, const int<3>);
> +    VSRDB_V1TI vsrdb_v1ti {}
> +
>    const vsq __builtin_altivec_vsrq (vsq, vuq);
>      VSRQ vlshrv1ti3 {}
> 
> diff --git a/gcc/config/rs6000/rs6000-overload.def b/gcc/config/rs6000/rs6000-overload.def
> index c4ecafc6f7e..302e0232533 100644
> --- a/gcc/config/rs6000/rs6000-overload.def
> +++ b/gcc/config/rs6000/rs6000-overload.def
> @@ -3396,9 +3396,13 @@
>    vull __builtin_vec_sld (vull, vull, const int);
>      VSLDOI_2DI  VSLDOI_VULL
>    vf __builtin_vec_sld (vf, vf, const int);
> -    VSLDOI_4SF
> +    VSLDOI_4SF VSLDOI_VF
>    vd __builtin_vec_sld (vd, vd, const int);
> -    VSLDOI_2DF
> +    VSLDOI_2DF VSLDOI_VD

The other instances for vector integer type have multiple uses of 1st token,
such as:

  vsll __builtin_vec_sld (vsll, vsll, const int);
    VSLDOI_2DI  VSLDOI_VSLL
  vbll __builtin_vec_sld (vbll, vbll, const int);
    VSLDOI_2DI  VSLDOI_VBLL
  vull __builtin_vec_sld (vull, vull, const int);
    VSLDOI_2DI  VSLDOI_VULL

, it's unable to use the 1st token VSLDOI_2DI for the overload id (otherwise
it can be ambiguous), but for vector float/double they don't have multiple
variants, VSLDOI_4SF and VSLDOI_2DF are used once respectively so they are
fine here.  I think the existing code is intentional so let's keep them
unchanged (creating more unnecessary ids is slightly worse than before).

> +  vsq __builtin_vec_sld (vsq, vsq, const int);
> +    VSLDOI_V1TI  VSLDOI_VSQ
> +  vuq __builtin_vec_sld (vuq, vuq, const int);
> +    VSLDOI_V1TI  VSLDOI_VUQ
> 
>  [VEC_SLDB, vec_sldb, __builtin_vec_sldb]
>    vsc __builtin_vec_sldb (vsc, vsc, const int);
> @@ -3417,6 +3421,10 @@
>      VSLDB_V2DI  VSLDB_VSLL
>    vull __builtin_vec_sldb (vull, vull, const int);
>      VSLDB_V2DI  VSLDB_VULL
> +  vsq __builtin_vec_sldb (vsq, vsq, const int);
> +    VSLDB_V1TI  VSLDB_VSQ
> +  vuq __builtin_vec_sldb (vuq, vuq, const int);
> +    VSLDB_V1TI  VSLDB_VUQ
> 
>  [VEC_SLDW, vec_sldw, __builtin_vec_sldw]
>    vsc __builtin_vec_sldw (vsc, vsc, const int);
> @@ -3439,6 +3447,10 @@
>      XXSLDWI_4SF  XXSLDWI_VF
>    vd __builtin_vec_sldw (vd, vd, const int);
>      XXSLDWI_2DF  XXSLDWI_VD
> +  vsq __builtin_vec_sldw (vsq, vsq, const int);
> +    XXSLDWI_Q  XXSLDWI_VSQ
> +  vuq __builtin_vec_sldw (vuq, vuq, const int);
> +    XXSLDWI_Q  XXSLDWI_VUQ

Nit: s/XXSLDWI_Q/XXSLDWI_1TI/ to keep consistent with the
other XXSLDWI_* as 1st token (XXSLDWI_16QI etc. are used
above rather than XXSLDWI_{SC,UC} etc.)

> 
>  [VEC_SLL, vec_sll, __builtin_vec_sll]
>    vsc __builtin_vec_sll (vsc, vuc);
> @@ -3459,6 +3471,10 @@
>      VSL  VSL_VSLL
>    vull __builtin_vec_sll (vull, vuc);
>      VSL  VSL_VULL
> +  vsq __builtin_vec_sll (vsq, vuc);
> +    VSL  VSL_VSQ
> +  vuq __builtin_vec_sll (vuq, vuc);
> +    VSL  VSL_VUQ
>  ; The following variants are deprecated.
>    vsc __builtin_vec_sll (vsc, vus);
>      VSL  VSL_VSC_VUS
> @@ -3554,6 +3570,14 @@
>      VSLO  VSLO_VFS
>    vf __builtin_vec_slo (vf, vuc);
>      VSLO  VSLO_VFU
> +  vsq __builtin_vec_slo (vsq, vsc);
> +    VSLO  VSLDO_VSQS
> +  vsq __builtin_vec_slo (vsq, vuc);
> +    VSLO  VSLDO_VSQU
> +  vuq __builtin_vec_slo (vuq, vsc);
> +    VSLO  VSLDO_VUQS
> +  vuq __builtin_vec_slo (vuq, vuc);
> +    VSLO  VSLDO_VUQU
> 
>  [VEC_SLV, vec_slv, __builtin_vec_vslv]
>    vuc __builtin_vec_vslv (vuc, vuc);
> @@ -3699,6 +3723,10 @@
>      VSRDB_V2DI  VSRDB_VSLL
>    vull __builtin_vec_srdb (vull, vull, const int);
>      VSRDB_V2DI  VSRDB_VULL
> +  vsq __builtin_vec_srdb (vsq, vsq, const int);
> +    VSRDB_V1TI  VSRDB_VSQ
> +  vuq __builtin_vec_srdb (vuq, vuq, const int);
> +    VSRDB_V1TI  VSRDB_VUQ
> 
>  [VEC_SRL, vec_srl, __builtin_vec_srl]
>    vsc __builtin_vec_srl (vsc, vuc);
> @@ -3719,6 +3747,10 @@
>      VSR  VSR_VSLL
>    vull __builtin_vec_srl (vull, vuc);
>      VSR  VSR_VULL
> +  vsq __builtin_vec_srl (vsq, vuc);
> +    VSR  VSR_VSQ
> +  vuq __builtin_vec_srl (vuq, vuc);
> +    VSR  VSR_VUQ
>  ; The following variants are deprecated.
>    vsc __builtin_vec_srl (vsc, vus);
>      VSR  VSR_VSC_VUS
> @@ -3808,6 +3840,14 @@
>      VSRO  VSRO_VFS
>    vf __builtin_vec_sro (vf, vuc);
>      VSRO  VSRO_VFU
> +  vsq __builtin_vec_sro (vsq, vsc);
> +    VSRO  VSRDO_VSQS
> +  vsq __builtin_vec_sro (vsq, vuc);
> +    VSRO  VSRDO_VSQU
> +  vuq __builtin_vec_sro (vuq, vsc);
> +    VSRO  VSRDO_VUQS
> +  vuq __builtin_vec_sro (vuq, vuc);
> +    VSRO  VSRDO_VUQU
> 
>  [VEC_SRV, vec_srv, __builtin_vec_vsrv]
>    vuc __builtin_vec_vsrv (vuc, vuc);
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 0b572afca72..5125a6d9def 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -23504,6 +23504,10 @@ const unsigned int);
>  vector signed long long, const unsigned int);
>  @exdent vector unsigned long long vec_sldb (vector unsigned long long,
>  vector unsigned long long, const unsigned int);
> +@exdent vector signed __int128 vec_sldb (vector signed __int128,
> +vector signed __int128, const unsigned int);
> +@exdent vector unsigned __int128 vec_sldb (vector unsigned __int128,
> +vector unsigned __int128, const unsigned int);
>  @end smallexample
> 
>  Shift the combined input vectors left by the amount specified by the low-order
> @@ -23531,6 +23535,10 @@ const unsigned int);
>  vector signed long long, const unsigned int);
>  @exdent vector unsigned long long vec_srdb (vector unsigned long long,
>  vector unsigned long long, const unsigned int);
> +@exdent vector signed __int128 vec_srdb (vector signed __int128,
> +vector signed __int128, const unsigned int);
> +@exdent vector unsigned __int128 vec_srdb (vector unsigned __int128,
> +vector unsigned __int128, const unsigned int);
>  @end smallexample
> 
>  Shift the combined input vectors right by the amount specified by the low-order
> @@ -24025,6 +24033,40 @@ int vec_any_le (vector signed __int128, vector signed __int128);
>  int vec_any_le (vector unsigned __int128, vector unsigned __int128);
>  @end smallexample
> 

Personally I prefer to move the below paragraph for description here to avoid
that two @smallexample sections are placed together, that is:

"The following instances are extension of the existing overloaded built-ins
@code{vec_sld}, @code{vec_sldw}, @code{vec_slo}, @code{vec_sro}, @code{vec_srl}
that are documented in the PVIPR."

@smallexample
....

BR,
Kewen
Carl Love July 25, 2024, 11:53 p.m. UTC | #12
Kewen:

On 7/25/24 1:21 AM, Kewen.Lin wrote:
> Hi Carl,
>
> Some minor comments are inlined on top of Segher's and Peter's comments.
>
> on 2024/7/20 04:04, Carl Love wrote:
>> GCC developers:
>>
>> The following patch adds the int128 varients to the existing overloaded built-ins vec_sld, vec_sldb, vec_sldw, vec_sll, vec_slo, vec_srdb, vec_srl, vec_sro.  These varients were requested by Steve Munroe.
>>
>> The patch has been tested on a Power 10 system with no regressions.
>>
>> Please let me know if the patch is acceptable for mainline.
>>
>>                                     Carl
>>
>>
>> -------------------------------------------------------------------------------------------------------
>>   rs6000, Add new overloaded vector shift builtin int128 varients
>>
>> Add the signed __int128 and unsigned __int128 argument types for the
>> overloaded built-ins vec_sld, vec_sldb, vec_sldw, vec_sll, vec_slo,
>> vec_srdb, vec_srl, vec_sro.  For each of the new argument types add a
>> testcase and update the documentation for the built-in.
>>
>> Add the missing internal names for the float and double types for
>> overloaded builtin vec_sld for the float and double types.
> This isn't needed, see below explanation.

OK, per comments below, removed the additional internal names.

<snip>
>> diff --git a/gcc/config/rs6000/rs6000-overload.def b/gcc/config/rs6000/rs6000-overload.def
>> index c4ecafc6f7e..302e0232533 100644
>> --- a/gcc/config/rs6000/rs6000-overload.def
>> +++ b/gcc/config/rs6000/rs6000-overload.def
>> @@ -3396,9 +3396,13 @@
>>     vull __builtin_vec_sld (vull, vull, const int);
>>       VSLDOI_2DI  VSLDOI_VULL
>>     vf __builtin_vec_sld (vf, vf, const int);
>> -    VSLDOI_4SF
>> +    VSLDOI_4SF VSLDOI_VF
>>     vd __builtin_vec_sld (vd, vd, const int);
>> -    VSLDOI_2DF
>> +    VSLDOI_2DF VSLDOI_VD
> The other instances for vector integer type have multiple uses of 1st token,
> such as:
>
>    vsll __builtin_vec_sld (vsll, vsll, const int);
>      VSLDOI_2DI  VSLDOI_VSLL
>    vbll __builtin_vec_sld (vbll, vbll, const int);
>      VSLDOI_2DI  VSLDOI_VBLL
>    vull __builtin_vec_sld (vull, vull, const int);
>      VSLDOI_2DI  VSLDOI_VULL
>
> , it's unable to use the 1st token VSLDOI_2DI for the overload id (otherwise
> it can be ambiguous), but for vector float/double they don't have multiple
> variants, VSLDOI_4SF and VSLDOI_2DF are used once respectively so they are
> fine here.  I think the existing code is intentional so let's keep them
> unchanged (creating more unnecessary ids is slightly worse than before).

OK, removed the addtional tokens VSLDOI_VF and VSLDOI_VD.
>
>> +  vsq __builtin_vec_sld (vsq, vsq, const int);
>> +    VSLDOI_V1TI  VSLDOI_VSQ
>> +  vuq __builtin_vec_sld (vuq, vuq, const int);
>> +    VSLDOI_V1TI  VSLDOI_VUQ
>>
>>   [VEC_SLDB, vec_sldb, __builtin_vec_sldb]
>>     vsc __builtin_vec_sldb (vsc, vsc, const int);
>> @@ -3417,6 +3421,10 @@
>>       VSLDB_V2DI  VSLDB_VSLL
>>     vull __builtin_vec_sldb (vull, vull, const int);
>>       VSLDB_V2DI  VSLDB_VULL
>> +  vsq __builtin_vec_sldb (vsq, vsq, const int);
>> +    VSLDB_V1TI  VSLDB_VSQ
>> +  vuq __builtin_vec_sldb (vuq, vuq, const int);
>> +    VSLDB_V1TI  VSLDB_VUQ
>>
>>   [VEC_SLDW, vec_sldw, __builtin_vec_sldw]
>>     vsc __builtin_vec_sldw (vsc, vsc, const int);
>> @@ -3439,6 +3447,10 @@
>>       XXSLDWI_4SF  XXSLDWI_VF
>>     vd __builtin_vec_sldw (vd, vd, const int);
>>       XXSLDWI_2DF  XXSLDWI_VD
>> +  vsq __builtin_vec_sldw (vsq, vsq, const int);
>> +    XXSLDWI_Q  XXSLDWI_VSQ
>> +  vuq __builtin_vec_sldw (vuq, vuq, const int);
>> +    XXSLDWI_Q  XXSLDWI_VUQ
> Nit: s/XXSLDWI_Q/XXSLDWI_1TI/ to keep consistent with the
> other XXSLDWI_* as 1st token (XXSLDWI_16QI etc. are used
> above rather than XXSLDWI_{SC,UC} etc.)

OK, changed to:

   vsq __builtin_vec_sldw (vsq, vsq, const int);
     XXSLDWI_1TI  XXSLDWI_VSQ
   vuq __builtin_vec_sldw (vuq, vuq, const int);
     XXSLDWI_1TI  XXSLDWI_VUQ

<snip>

>>   [VEC_SRV, vec_srv, __builtin_vec_vsrv]
>>     vuc __builtin_vec_vsrv (vuc, vuc);
>> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
>> index 0b572afca72..5125a6d9def 100644
>> --- a/gcc/doc/extend.texi
>> +++ b/gcc/doc/extend.texi
>> @@ -23504,6 +23504,10 @@ const unsigned int);
>>   vector signed long long, const unsigned int);
>>   @exdent vector unsigned long long vec_sldb (vector unsigned long long,
>>   vector unsigned long long, const unsigned int);
>> +@exdent vector signed __int128 vec_sldb (vector signed __int128,
>> +vector signed __int128, const unsigned int);
>> +@exdent vector unsigned __int128 vec_sldb (vector unsigned __int128,
>> +vector unsigned __int128, const unsigned int);
>>   @end smallexample
>>
>>   Shift the combined input vectors left by the amount specified by the low-order
>> @@ -23531,6 +23535,10 @@ const unsigned int);
>>   vector signed long long, const unsigned int);
>>   @exdent vector unsigned long long vec_srdb (vector unsigned long long,
>>   vector unsigned long long, const unsigned int);
>> +@exdent vector signed __int128 vec_srdb (vector signed __int128,
>> +vector signed __int128, const unsigned int);
>> +@exdent vector unsigned __int128 vec_srdb (vector unsigned __int128,
>> +vector unsigned __int128, const unsigned int);
>>   @end smallexample
>>
>>   Shift the combined input vectors right by the amount specified by the low-order
>> @@ -24025,6 +24033,40 @@ int vec_any_le (vector signed __int128, vector signed __int128);
>>   int vec_any_le (vector unsigned __int128, vector unsigned __int128);
>>   @end smallexample
>>
> Personally I prefer to move the below paragraph for description here to avoid
> that two @smallexample sections are placed together, that is:
>
> "The following instances are extension of the existing overloaded built-ins
> @code{vec_sld}, @code{vec_sldw}, @code{vec_slo}, @code{vec_sro}, @code{vec_srl}
> that are documented in the PVIPR."
>
> @smallexample
> ....

OK, moved the changes for the extensions to the IPVR to right after the 
new vec_srb entries.

                        Carl
Carl Love July 25, 2024, 11:53 p.m. UTC | #13
Peter, Segher:

On 7/23/24 2:26 PM, Peter Bergner wrote:
> On 7/19/24 3:04 PM, Carl Love wrote:
>> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
>> index 5af9bf920a2..2a18ee44526 100644
>> --- a/gcc/config/rs6000/altivec.md
>> +++ b/gcc/config/rs6000/altivec.md
>> @@ -878,9 +878,9 @@ (define_int_attr SLDB_lr [(UNSPEC_SLDB "l")
>>   (define_int_iterator VSHIFT_DBL_LR [UNSPEC_SLDB UNSPEC_SRDB])
>>
>>   (define_insn "vs<SLDB_lr>db_<mode>"
>> - [(set (match_operand:VI2 0 "register_operand" "=v")
>> -  (unspec:VI2 [(match_operand:VI2 1 "register_operand" "v")
>> -           (match_operand:VI2 2 "register_operand" "v")
>> + [(set (match_operand:VEC_IC 0 "register_operand" "=v")
>> +  (unspec:VEC_IC [(match_operand:VEC_IC 1 "register_operand" "v")
>> +           (match_operand:VEC_IC 2 "register_operand" "v")
>>              (match_operand:QI 3 "const_0_to_12_operand" "n")]
>>             VSHIFT_DBL_LR))]
>>     "TARGET_POWER10"
> I know the old code used the register_operand predicate for the vector
> operands, but those really should be changed to altivec_register_operand.
>
> Peter
Segher's response was:

register_operand is just fine usually.  The "v" constraint already makes
sure things end up in a VMX (a lower VSX) register, the predicate
doesn't help here.  register_operand is shorter (and thus, preferred),
and also more likely correct if the code changes later 🙂


Which Peter said and Segher responded:

On Wed, Jul 24, 2024 at 12:12:05PM -0500, Peter Bergner wrote:

> On 7/24/24 12:06 PM, Segher Boessenkool wrote:
> I thought we always wanted the predicate to match the constraint being used?

Predicates and constraints have different purposes, and are used at
different times (typically).  Everything before RA is predicates
only, and RA and everything after it use constraints (as well).

register_operand says it has to be a register.  It allows any
pseudo-register, so before RA, there is no real difference between
register_operand and altivec_register_operand (which allows all pseudos
as well)..

The constraint should not demand things that weren't clear earlier,
because that will then cause reloading eventually, often with less
efficient code.  It still will*work*  though.

But that is not the case here 🙂


So, I think the final word here is don't change it.

                  Carl

>
Carl Love July 26, 2024, 5:07 p.m. UTC | #14
Segher:

On 7/24/24 11:47 AM, Segher Boessenkool wrote:
> Hi!
>
> On Wed, Jul 24, 2024 at 11:38:11AM -0700, Carl Love wrote:
>> On 7/24/24 10:03 AM, Segher Boessenkool wrote:
>>> So much manual stuff needed, sigh.
>>>
>>> On Fri, Jul 19, 2024 at 01:04:12PM -0700, Carl Love wrote:
>>>> gcc/ChangeLog:
>>>>      * config/rs6000/altivec.md (vs<SLDB_lr>db_<mode>): Change
>>>>      define_insn iterator to VEC_IC.
>>>  From VI2 (a nothing-saying name) to VEC_IC (also a nonsensical name).
>>>
>>> Maybe VEC_IC should have a comment explaining the TARGET_POWER10 thing
>>> at least?  Just something like "ISA 3.1 added 128-bit things" or
>>> whatever, but don't leave the reader second-guessing, a reader will
>>> often guess wrong :-)
>> I don't disagree that the reader will guess wrong, probably after being
>> frustated that it isn't obvious.  :-)
>> The VEC_IC was an existing definition, this patch does not add it.  Your
>> comments seems to imply you want a comment on the definition for VEC_IC
>> in vector.md?  I could add one to the existing definition if you like
>> but it seems outside the scope of this patch.
> Yes, I'm just lamenting the state of things :-)
>
> It would have to be a separate patch, yes.  A trivial patch to add such
> a comment is pre-approved :-)

OK, no changes in this patch.  Will do a second patch.  Best that I post 
it for review anyway.

>
>> The change log entry could be improved to say "Change define_insn
>> iterator to VEC_IC which included the V1TI type added in ISA 3.1." Would
>> that address your concerns?
> The current changelog is fine.  Changelogs never can replace comments in
> the code.

OK

>>>> +/* { dg-do run  { target { int128 } && { power10_hw } } } */
>>> Everything power10 is int128 always.
>> OK, so don't need the power10_hw.  Changed to just int128 for the target:
> No, the other way around: you cannot run the code on machines without
> these (ISA 3.1) instructions!

OK, made it { dg-do run  { target power10_hw } } */
>
> But p10 always satisfies the int128 predicate.  Although, hrm, how
> about -m32 :-)

If I test with -m64 I get 8 passes:

    make -j 1 && make check-gcc RUNTESTFLAGS="-v -v 
powerpc.exp=vec-shift-double-runnable-int128.c --target_board=unix'{-m64}'"
   # of expected passes            8

If I test with -m32 I get unsupported test,

    make -j 1 && make check-gcc RUNTESTFLAGS="-v -v 
powerpc.exp=vec-shift-double-runnable-int128.c --target_board=unix'{-m32}'"
     # of unsupported tests          1

Looking further into the output the checks say:
    Checking pattern "hppa*-*-hpux*" with powerpc64le-unknown-linux-gnu
    compiler exited with status 1
    output is:
    cc1: error: '-m32' not supported in this configuration^M

    check_cached_effective_target power10_hw_available: returning 0 for 
unix/-m32
    is-effective-target: power10_hw 0

Looks like the power10_hw is sufficient to prevent the test from 
running.  Don't need to explicitly check for int128 as well.


>
>>      /* { dg-do run  target int128  } */
>>>> +/* { dg-do link { target { ! power10_hw } } } */
>>>> +/* { dg-require-effective-target power10_ok } */
>>> So this is enough always.
>>>
>>> Often we have two testcases, one for run, one for compiling only.  It's
>>> a bit simpler and cleaner.
>> Sounds like you would prefer to have a run and a compile test file? I
>> will create a new file vec-shift-double-int128.c  consisting of a series
>> of functions to test each built-in definition.
> No, I don't prefer that, but it is easier to handle (also for you).
> That that results in a bit more files, who cares, I don't anyway :-)
>
>>>> +/* { dg-options "-mdejagnu-cpu=power10 -save-temps" } */
>>> Why the -save-temps?  Always document it if you want that for something,
>>> but never put it in the testcase if not.  A leftover from development?
>>>
>>> Okay for trunk, thank you!  Well Peter had some comments too, modulo
>>> those I guess, I'll read them now ;-)
>> So as Peter said, the save-temps is because the runnable case file also
>> checks for assembler times at the end of the file.
> Yup.  A comment would help :-)

OK, added comment.

/* { dg-require-effective-target power10_ok } */

/* Need -save-temps for dg-final scan-assembler-times at end of test.  */
/* { dg-options "-mdejagnu-cpu=power10 -save-temps" } */


                                       Carl
Peter Bergner July 26, 2024, 6:17 p.m. UTC | #15
On 7/26/24 12:07 PM, Carl Love wrote:
> On 7/24/24 11:47 AM, Segher Boessenkool wrote:
>>>>> +/* { dg-do run  { target { int128 } && { power10_hw } } } */
>>>> Everything power10 is int128 always.
>>> OK, so don't need the power10_hw.  Changed to just int128 for the target:
>> No, the other way around: you cannot run the code on machines without
>> these (ISA 3.1) instructions!
> 
> OK, made it { dg-do run  { target power10_hw } } */
>>
>> But p10 always satisfies the int128 predicate.  Although, hrm, how
>> about -m32 :-)
> 
> If I test with -m64 I get 8 passes:
> 
>    make -j 1 && make check-gcc RUNTESTFLAGS="-v -v powerpc.exp=vec-shift-double-runnable-int128.c --target_board=unix'{-m64}'"
>   # of expected passes            8
> 
> If I test with -m32 I get unsupported test,
> 
>    make -j 1 && make check-gcc RUNTESTFLAGS="-v -v powerpc.exp=vec-shift-double-runnable-int128.c --target_board=unix'{-m32}'"
>     # of unsupported tests          1
> 
> Looking further into the output the checks say:
>    Checking pattern "hppa*-*-hpux*" with powerpc64le-unknown-linux-gnu
>    compiler exited with status 1
>    output is:
>    cc1: error: '-m32' not supported in this configuration^M
> 
>    check_cached_effective_target power10_hw_available: returning 0 for unix/-m32
>    is-effective-target: power10_hw 0
> 
> Looks like the power10_hw is sufficient to prevent the test from running.  Don't need to explicitly check for int128 as well.

That's only because you're testing on LE where -m32 is a not supported
configuration.  If you tested on BE, you'd probably see a FAIL for -m32.
I think your initial target tests with int128 and power10_hw is actually
correct.  We have an internal Power10 partition (ltcd97-lp7) running a
BE distro you can test this on.

Peter
diff mbox series

Patch

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 5af9bf920a2..2a18ee44526 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -878,9 +878,9 @@  (define_int_attr SLDB_lr [(UNSPEC_SLDB "l")
  (define_int_iterator VSHIFT_DBL_LR [UNSPEC_SLDB UNSPEC_SRDB])

  (define_insn "vs<SLDB_lr>db_<mode>"
- [(set (match_operand:VI2 0 "register_operand" "=v")
-  (unspec:VI2 [(match_operand:VI2 1 "register_operand" "v")
-           (match_operand:VI2 2 "register_operand" "v")
+ [(set (match_operand:VEC_IC 0 "register_operand" "=v")
+  (unspec:VEC_IC [(match_operand:VEC_IC 1 "register_operand" "v")
+           (match_operand:VEC_IC 2 "register_operand" "v")
             (match_operand:QI 3 "const_0_to_12_operand" "n")]
            VSHIFT_DBL_LR))]
    "TARGET_POWER10"
diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 77eb0f7e406..fbb6e1ddf85 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -964,6 +964,9 @@ 
    const vss __builtin_altivec_vsldoi_8hi (vss, vss, const int<4>);
      VSLDOI_8HI altivec_vsldoi_v8hi {}

+  const vsq __builtin_altivec_vsldoi_v1ti (vsq, vsq, const int<4>);
+    VSLDOI_V1TI altivec_vsldoi_v1ti {}
+
    const vss __builtin_altivec_vslh (vss, vus);
      VSLH vashlv8hi3 {}

@@ -1831,6 +1834,9 @@ 
    const vsll __builtin_vsx_xxsldwi_2di (vsll, vsll, const int<2>);
      XXSLDWI_2DI vsx_xxsldwi_v2di {}

+  const vsq __builtin_vsx_xxsldwi_v1ti (vsq, vsq, const int<2>);
+    XXSLDWI_Q vsx_xxsldwi_v1ti {}
+
    const vf __builtin_vsx_xxsldwi_4sf (vf, vf, const int<2>);
      XXSLDWI_4SF vsx_xxsldwi_v4sf {}

@@ -3299,6 +3305,9 @@ 
    const vss __builtin_altivec_vsldb_v8hi (vss, vss, const int<3>);
      VSLDB_V8HI vsldb_v8hi {}

+  const vsq __builtin_altivec_vsldb_v1ti (vsq, vsq, const int<3>);
+    VSLDB_V1TI vsldb_v1ti {}
+
    const vsq __builtin_altivec_vslq (vsq, vuq);
      VSLQ vashlv1ti3 {}

@@ -3317,6 +3326,9 @@ 
    const vss __builtin_altivec_vsrdb_v8hi (vss, vss, const int<3>);
      VSRDB_V8HI vsrdb_v8hi {}

+  const vsq __builtin_altivec_vsrdb_v1ti (vsq, vsq, const int<3>);
+    VSRDB_V1TI vsrdb_v1ti {}
+
    const vsq __builtin_altivec_vsrq (vsq, vuq);
      VSRQ vlshrv1ti3 {}

diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index c4ecafc6f7e..302e0232533 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -3396,9 +3396,13 @@ 
    vull __builtin_vec_sld (vull, vull, const int);
      VSLDOI_2DI  VSLDOI_VULL
    vf __builtin_vec_sld (vf, vf, const int);
-    VSLDOI_4SF
+    VSLDOI_4SF VSLDOI_VF
    vd __builtin_vec_sld (vd, vd, const int);
-    VSLDOI_2DF
+    VSLDOI_2DF VSLDOI_VD
+  vsq __builtin_vec_sld (vsq, vsq, const int);
+    VSLDOI_V1TI  VSLDOI_VSQ
+  vuq __builtin_vec_sld (vuq, vuq, const int);
+    VSLDOI_V1TI  VSLDOI_VUQ

  [VEC_SLDB, vec_sldb, __builtin_vec_sldb]
    vsc __builtin_vec_sldb (vsc, vsc, const int);
@@ -3417,6 +3421,10 @@ 
      VSLDB_V2DI  VSLDB_VSLL
    vull __builtin_vec_sldb (vull, vull, const int);
      VSLDB_V2DI  VSLDB_VULL
+  vsq __builtin_vec_sldb (vsq, vsq, const int);
+    VSLDB_V1TI  VSLDB_VSQ
+  vuq __builtin_vec_sldb (vuq, vuq, const int);
+    VSLDB_V1TI  VSLDB_VUQ

  [VEC_SLDW, vec_sldw, __builtin_vec_sldw]
    vsc __builtin_vec_sldw (vsc, vsc, const int);
@@ -3439,6 +3447,10 @@ 
      XXSLDWI_4SF  XXSLDWI_VF
    vd __builtin_vec_sldw (vd, vd, const int);
      XXSLDWI_2DF  XXSLDWI_VD
+  vsq __builtin_vec_sldw (vsq, vsq, const int);
+    XXSLDWI_Q  XXSLDWI_VSQ
+  vuq __builtin_vec_sldw (vuq, vuq, const int);
+    XXSLDWI_Q  XXSLDWI_VUQ

  [VEC_SLL, vec_sll, __builtin_vec_sll]
    vsc __builtin_vec_sll (vsc, vuc);
@@ -3459,6 +3471,10 @@ 
      VSL  VSL_VSLL
    vull __builtin_vec_sll (vull, vuc);
      VSL  VSL_VULL
+  vsq __builtin_vec_sll (vsq, vuc);
+    VSL  VSL_VSQ
+  vuq __builtin_vec_sll (vuq, vuc);
+    VSL  VSL_VUQ
  ; The following variants are deprecated.
    vsc __builtin_vec_sll (vsc, vus);
      VSL  VSL_VSC_VUS
@@ -3554,6 +3570,14 @@ 
      VSLO  VSLO_VFS
    vf __builtin_vec_slo (vf, vuc);
      VSLO  VSLO_VFU
+  vsq __builtin_vec_slo (vsq, vsc);
+    VSLO  VSLDO_VSQS
+  vsq __builtin_vec_slo (vsq, vuc);
+    VSLO  VSLDO_VSQU
+  vuq __builtin_vec_slo (vuq, vsc);
+    VSLO  VSLDO_VUQS
+  vuq __builtin_vec_slo (vuq, vuc);
+    VSLO  VSLDO_VUQU

  [VEC_SLV, vec_slv, __builtin_vec_vslv]
    vuc __builtin_vec_vslv (vuc, vuc);
@@ -3699,6 +3723,10 @@ 
      VSRDB_V2DI  VSRDB_VSLL
    vull __builtin_vec_srdb (vull, vull, const int);
      VSRDB_V2DI  VSRDB_VULL
+  vsq __builtin_vec_srdb (vsq, vsq, const int);
+    VSRDB_V1TI  VSRDB_VSQ
+  vuq __builtin_vec_srdb (vuq, vuq, const int);
+    VSRDB_V1TI  VSRDB_VUQ

  [VEC_SRL, vec_srl, __builtin_vec_srl]
    vsc __builtin_vec_srl (vsc, vuc);
@@ -3719,6 +3747,10 @@ 
      VSR  VSR_VSLL
    vull __builtin_vec_srl (vull, vuc);
      VSR  VSR_VULL
+  vsq __builtin_vec_srl (vsq, vuc);
+    VSR  VSR_VSQ
+  vuq __builtin_vec_srl (vuq, vuc);
+    VSR  VSR_VUQ
  ; The following variants are deprecated.
    vsc __builtin_vec_srl (vsc, vus);
      VSR  VSR_VSC_VUS
@@ -3808,6 +3840,14 @@ 
      VSRO  VSRO_VFS
    vf __builtin_vec_sro (vf, vuc);
      VSRO  VSRO_VFU
+  vsq __builtin_vec_sro (vsq, vsc);
+    VSRO  VSRDO_VSQS
+  vsq __builtin_vec_sro (vsq, vuc);
+    VSRO  VSRDO_VSQU
+  vuq __builtin_vec_sro (vuq, vsc);
+    VSRO  VSRDO_VUQS
+  vuq __builtin_vec_sro (vuq, vuc);
+    VSRO  VSRDO_VUQU

  [VEC_SRV, vec_srv, __builtin_vec_vsrv]
    vuc __builtin_vec_vsrv (vuc, vuc);
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 0b572afca72..5125a6d9def 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -23504,6 +23504,10 @@  const unsigned int);
  vector signed long long, const unsigned int);
  @exdent vector unsigned long long vec_sldb (vector unsigned long long,
  vector unsigned long long, const unsigned int);
+@exdent vector signed __int128 vec_sldb (vector signed __int128,
+vector signed __int128, const unsigned int);
+@exdent vector unsigned __int128 vec_sldb (vector unsigned __int128,
+vector unsigned __int128, const unsigned int);
  @end smallexample

  Shift the combined input vectors left by the amount specified by the 
low-order
@@ -23531,6 +23535,10 @@  const unsigned int);
  vector signed long long, const unsigned int);
  @exdent vector unsigned long long vec_srdb (vector unsigned long long,
  vector unsigned long long, const unsigned int);
+@exdent vector signed __int128 vec_srdb (vector signed __int128,
+vector signed __int128, const unsigned int);
+@exdent vector unsigned __int128 vec_srdb (vector unsigned __int128,
+vector unsigned __int128, const unsigned int);
  @end smallexample

  Shift the combined input vectors right by the amount specified by the 
low-order
@@ -24025,6 +24033,40 @@  int vec_any_le (vector signed __int128, vector 
signed __int128);
  int vec_any_le (vector unsigned __int128, vector unsigned __int128);
  @end smallexample

+@smallexample
+@exdent vector signed __int128 vec_sld (vector signed __int128,
+vector signed __int128, const unsigned int);
+@exdent vector unsigned __int128 vec_sld (vector unsigned __int128,
+vector unsigned __int128, const unsigned int);
+@exdent vector signed __int128 vec_sldw (vector signed __int128,
+vector signed __int128, const unsigned int);
+@exdent vector unsigned __int128 vec_sldw (vector unsigned __int,
+vector unsigned __int128, const unsigned int);
+@exdent vector signed __int128 vec_slo (vector signed __int128,
+vector signed char);
+@exdent vector signed __int128 vec_slo (vector signed __int128,
+vector unsigned char);
+@exdent vector unsigned __int128 vec_slo (vector unsigned __int128,
+vector signed char);
+@exdent vector unsigned __int128 vec_slo (vector unsigned __int128,
+vector unsigned char);
+@exdent vector signed __int128 vec_sro (vector signed __int128,
+vector signed char);
+@exdent vector signed __int128 vec_sro (vector signed __int128,
+vector unsigned char);
+@exdent vector unsigned __int128 vec_sro (vector unsigned __int128,
+vector signed char);
+@exdent vector unsigned __int128 vec_sro (vector unsigned __int128,
+vector unsigned char);
+@exdent vector signed __int128 vec_srl (vector signed __int128,
+vector unsigned char);
+@exdent vector unsigned __int128 vec_srl (vector unsigned __int128,
+vector unsigned char);
+@end smallexample
+
+The above instances are extension of the exiting overloaded built-ins
+@code{vec_sld}, @code{vec_sldw}, @code{vec_slo}, @code{vec_sro}, 
@code{vec_srl}
+that are documented in the PVIPR.

  @node PowerPC Hardware Transactional Memory Built-in Functions
  @subsection PowerPC Hardware Transactional Memory Built-in Functions
diff --git 
a/gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c 
b/gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c
new file mode 100644
index 00000000000..bb90f489149
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c
@@ -0,0 +1,349 @@ 
+/* { dg-do run  { target { int128 } && { power10_hw } } } */
+/* { dg-do link { target { ! power10_hw } } } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -save-temps" } */
+
+#include <altivec.h>
+
+#define DEBUG 0
+
+#if DEBUG
+#include <stdio.h>
+
+void print_i128 (unsigned __int128 val)
+{
+  printf(" 0x%016llx%016llx",
+     (unsigned long long)(val >> 64),
+     (unsigned long long)(val & 0xFFFFFFFFFFFFFFFF));
+}
+#endif
+
+extern void abort (void);
+
+#if DEBUG
+#define ACTION_2ARG_UNSIGNED(NAME, TYPE_NAME)                \
+  printf ("vec_%s (vector TYPE __int128, vector TYPE __int128) \n", 
#NAME); \
+  printf(" src_va_s128[0] =      ");                    \
+  print_i128 ((unsigned __int128) src_va_##TYPE_NAME[0]);        \
+  printf("\n");                            \
+  printf(" src_vb_uc =            0x");                \
+  for (i = 0; i < 16; i++)                         \
+    printf("%02x",  src_vb_uc[i]);                    \
+  printf("\n");                            \
+  printf(" vresult[0] =          ");                    \
+  print_i128 ((unsigned __int128) vresult[0]);                \
+  printf("\n");                            \
+  printf(" expected_vresult[0] = ");                    \
+  print_i128 ((unsigned __int128) expected_vresult[0]);        \
+  printf("\n");
+
+#define ACTION_2ARG_SIGNED(NAME, TYPE_NAME)                \
+  printf ("vec_%s (vector TYPE __int128, vector TYPE __int128) \n", 
#NAME); \
+  printf(" src_va_s128[0] =      ");                    \
+  print_i128 ((unsigned __int128) src_va_##TYPE_NAME[0]);        \
+  printf("\n");                            \
+  printf(" src_vb_sc =            0x");                \
+  for (i = 0; i < 16; i++)                         \
+    printf("%02x",  src_vb_sc[i]);                    \
+  printf("\n");                            \
+  printf(" vresult[0] =          ");                    \
+  print_i128 ((unsigned __int128) vresult[0]);                \
+  printf("\n");                            \
+  printf(" expected_vresult[0] = ");                    \
+  print_i128 ((unsigned __int128) expected_vresult[0]);        \
+  printf("\n");
+
+#define ACTION_3ARG(NAME, TYPE_NAME, CONST)                \
+  printf ("vec_%s (vector TYPE __int128, vector TYPE __int128, %s) 
\n",    \
+    #NAME, #CONST);                            \
+  printf(" src_va_s128[0] =      ");                    \
+  print_i128 ((unsigned __int128) src_va_##TYPE_NAME[0]);        \
+  printf("\n");                            \
+  printf(" src_vb_s128[0] =      ");                    \
+  print_i128 ((unsigned __int128) src_vb_##TYPE_NAME[0]);        \
+  printf("\n");                            \
+  printf(" vresult[0] =          ");                    \
+  print_i128 ((unsigned __int128) vresult[0]);                \
+  printf("\n");                            \
+  printf(" expected_vresult[0] = ");                    \
+  print_i128 ((unsigned __int128) expected_vresult[0]);        \
+  printf("\n");
+#else
+#define ACTION_2ARG(NAME, TYPE_NAME)                        \
+  abort();
+
+#define ACTION_3ARG(NAME, TYPE_NAME, CONST)                \
+  abort();
+#endif
+
+/* Second argument of the builtin is vector unsigned char.  */
+#define TEST_2ARG_UNSIGNED(NAME, TYPE, TYPE_NAME, EXP_RESULT_HI, 
EXP_RESULT_LO) \
+  {                                    \
+    vector TYPE __int128 vresult;                    \
+    vector TYPE __int128 expected_vresult;                \
+    int i;                                \
+                                        \
+    expected_vresult = (vector TYPE __int128) { EXP_RESULT_HI }; \
+    expected_vresult = (expected_vresult << 64) | \
+      (vector TYPE __int128) { EXP_RESULT_LO };            \
+    vresult = vec_##NAME (src_va_##TYPE_NAME, src_vb_uc);        \
+                                    \
+    if (!vec_all_eq (vresult,  expected_vresult)) {            \
+      ACTION_2ARG_UNSIGNED(NAME, TYPE_NAME)                \
+    }                                    \
+  }
+
+/* Second argument of the builtin is vector signed char.  */
+#define TEST_2ARG_SIGNED(NAME, TYPE, TYPE_NAME, EXP_RESULT_HI, 
EXP_RESULT_LO) \
+  {                                    \
+    vector TYPE __int128 vresult;                    \
+    vector TYPE __int128 expected_vresult;                \
+    int i;                                \
+                                        \
+    expected_vresult = (vector TYPE __int128) { EXP_RESULT_HI }; \
+    expected_vresult = (expected_vresult << 64) | \
+      (vector TYPE __int128) { EXP_RESULT_LO };            \
+    vresult = vec_##NAME (src_va_##TYPE_NAME, src_vb_sc);        \
+                                    \
+    if (!vec_all_eq (vresult,  expected_vresult)) {            \
+      ACTION_2ARG_SIGNED(NAME, TYPE_NAME)                \
+    }                                    \
+  }
+
+#define TEST_3ARG(NAME, TYPE, TYPE_NAME, CONST, EXP_RESULT_HI, 
EXP_RESULT_LO) \
+  {                                    \
+    vector TYPE __int128 vresult;                    \
+    vector TYPE __int128 expected_vresult;                \
+                                        \
+    expected_vresult = (vector TYPE __int128) { EXP_RESULT_HI }; \
+    expected_vresult = (expected_vresult << 64) | \
+      (vector TYPE __int128) { EXP_RESULT_LO };            \
+    vresult = vec_##NAME (src_va_##TYPE_NAME, src_vb_##TYPE_NAME, 
CONST);    \
+                                    \
+    if (!vec_all_eq (vresult,  expected_vresult)) {            \
+      ACTION_3ARG(NAME, TYPE_NAME, CONST)                \
+    }                                    \
+  }
+
+int
+main (int argc, char *argv [])
+{
+  vector signed __int128 vresult_s128;
+  vector signed __int128 expected_vresult_s128;
+  vector signed __int128 src_va_s128;
+  vector signed __int128 src_vb_s128;
+  vector unsigned __int128 vresult_u128;
+  vector unsigned __int128 expected_vresult_u128;
+  vector unsigned __int128 src_va_u128;
+  vector unsigned __int128 src_vb_u128;
+  vector signed char src_vb_sc;
+  vector unsigned char src_vb_uc;
+
+  /* 128-bit vector shift right tests, vec_srdb. */
+  src_va_s128 = (vector signed __int128) {0x12345678};
+  src_vb_s128 = (vector signed __int128) {0xFEDCBA90};
+  TEST_3ARG(srdb, signed, s128, 4, 0x8000000000000000, 0xFEDCBA9)
+
+  src_va_u128 = (vector unsigned __int128) { 0xFEDCBA98 };
+  src_vb_u128 = (vector unsigned __int128) { 0x76543210};
+  TEST_3ARG(srdb, unsigned, u128, 4, 0x8000000000000000, 0x07654321)
+
+  /* 128-bit vector shift left tests, vec_sldb. */
+  src_va_s128 = (vector signed __int128) {0x123456789ABCDEF0};
+  src_va_s128 = (src_va_s128 << 64)
+    | (vector signed __int128) {0x123456789ABCDEF0};
+  src_vb_s128 = (vector signed __int128) {0xFEDCBA9876543210};
+  src_vb_s128 = (src_vb_s128 << 64)
+    | (vector signed __int128) {0xFEDCBA9876543210};
+  TEST_3ARG(sldb, signed, s128, 4, 0x23456789ABCDEF01, 0x23456789ABCDEF0F)
+
+  src_va_u128 = (vector unsigned __int128) {0xFEDCBA9876543210};
+  src_va_u128 = src_va_u128 << 64
+    | (vector unsigned __int128) {0xFEDCBA9876543210};
+  src_vb_u128 = (vector unsigned __int128) {0x123456789ABCDEF0};
+  src_vb_u128 = src_vb_u128 << 64
+    | (vector unsigned __int128) {0x123456789ABCDEF0};
+  TEST_3ARG(sldb, unsigned, u128, 4, 0xEDCBA9876543210F, 
0xEDCBA98765432101)
+
+  /* Shift left by octect tests, vec_sld.  Shift is by immediate value
+     times 8. */
+  src_va_s128 = (vector signed __int128) {0x123456789ABCDEF0};
+  src_va_s128 = (src_va_s128 << 64)
+    | (vector signed __int128) {0x123456789ABCDEF0};
+  src_vb_s128 = (vector signed __int128) {0xFEDCBA9876543210};
+  src_vb_s128 = (src_vb_s128 << 64)
+    | (vector signed __int128) {0xFEDCBA9876543210};
+  TEST_3ARG(sld, signed, s128, 4, 0x9abcdef012345678, 0x9abcdef0fedcba98)
+
+  src_va_u128 = (vector unsigned __int128) {0xFEDCBA9876543210};
+  src_va_u128 = src_va_u128 << 64
+    | (vector unsigned __int128) {0xFEDCBA9876543210};
+  src_vb_u128 = (vector unsigned __int128) {0x123456789ABCDEF0};
+  src_vb_u128 = src_vb_u128 << 64
+    | (vector unsigned __int128) {0x123456789ABCDEF0};
+  TEST_3ARG(sld, unsigned, u128, 4, 0x76543210fedcba98, 0x7654321012345678)
+
+  /* Vector left shift bytes within the vector, vec_sll. */
+  src_va_s128 = (vector signed __int128) {0x123456789ABCDEF0};
+  src_va_s128 = (src_va_s128 << 64)
+    | (vector signed __int128) {0x123456789ABCDEF0};
+  src_vb_uc = (vector unsigned char) {0x01, 0x01, 0x01, 0x01,
+                      0x01, 0x01, 0x01, 0x01,
+                      0x01, 0x01, 0x01, 0x01,
+                      0x01, 0x01, 0x01, 0x01};
+  TEST_2ARG_UNSIGNED(sll, signed, s128, 0x2468acf13579bde0,
+             0x2468acf13579bde0)
+
+  src_va_u128 = (vector unsigned __int128) {0x123456789ABCDEF0};
+  src_va_u128 = src_va_u128 << 64
+    | (vector unsigned __int128) {0x123456789ABCDEF0};
+  src_vb_uc = (vector unsigned char) {0x02, 0x02, 0x02, 0x02,
+                      0x02, 0x02, 0x02, 0x02,
+                      0x02, 0x02, 0x02, 0x02,
+                      0x02, 0x02, 0x02, 0x02};
+  TEST_2ARG_UNSIGNED(sll, unsigned, u128, 0x48d159e26af37bc0,
+             0x48d159e26af37bc0)
+
+  /* Vector right shift bytes within the vector, vec_srl. */
+  src_va_s128 = (vector signed __int128) {0x123456789ABCDEF0};
+  src_va_s128 = (src_va_s128 << 64)
+    | (vector signed __int128) {0x123456789ABCDEF0};
+  src_vb_uc = (vector unsigned char) {0x01, 0x01, 0x01, 0x01,
+                      0x01, 0x01, 0x01, 0x01,
+                      0x01, 0x01, 0x01, 0x01,
+                      0x01, 0x01, 0x01, 0x01};
+  TEST_2ARG_UNSIGNED(srl, signed, s128, 0x091a2b3c4d5e6f78,
+             0x091a2b3c4d5e6f78)
+
+  src_va_u128 = (vector unsigned __int128) {0x123456789ABCDEF0};
+  src_va_u128 = src_va_u128 << 64
+    | (vector unsigned __int128) {0x123456789ABCDEF0};
+  src_vb_uc = (vector unsigned char) {0x02, 0x02, 0x02, 0x02,
+                      0x02, 0x02, 0x02, 0x02,
+                      0x02, 0x02, 0x02, 0x02,
+                      0x02, 0x02, 0x02, 0x02};
+  TEST_2ARG_UNSIGNED(srl, unsigned, u128, 0x48d159e26af37bc,
+             0x48d159e26af37bc)
+
+  /* Shift left by octect tests, vec_slo.  Shift is by immediate value
+     bytes.  Shift amount in bits 121:124.  */
+  src_va_s128 = (vector signed __int128) {0x123456789ABCDEF0};
+  src_va_s128 = (src_va_s128 << 64)
+    | (vector signed __int128) {0x123456789ABCDEF0};
+  /* Note vb_sc is Endian specific, this is just LE.  */
+  /* The left shift amount is 1 byte, i.e. 1 * 8 bits.  */
+  src_vb_sc = (vector signed char) {0x1 << 3, 0x0, 0x0, 0x0,
+                    0x0, 0x0, 0x0, 0x0,
+                    0x0, 0x0, 0x0, 0x0,
+                    0x0, 0x0, 0x0, 0x0};
+
+  TEST_2ARG_SIGNED(slo, signed, s128, 0x3456789ABCDEF012,
+           0x3456789ABCDEF000)
+  src_va_s128 = (vector signed __int128) {0x123456789ABCDEF0};
+  src_va_s128 = (src_va_s128 << 64)
+    | (vector signed __int128) {0x123456789ABCDEF0};
+  /* Note vb_sc is Endian specific, this is just LE.  */
+  /* The left shift amount is 2 bytes, i.e. 2 * 8 bits.  */
+  src_vb_uc = (vector unsigned char) {0x2 << 3, 0x0, 0x0, 0x0,
+                      0x0, 0x0, 0x0, 0x0,
+                      0x0, 0x0, 0x0, 0x0,
+                      0x0, 0x0, 0x0, 0x0};
+  TEST_2ARG_UNSIGNED(slo, signed, s128, 0x56789ABCDEF01234,
+             0x56789ABCDEF00000)
+
+  src_va_u128 = (vector unsigned __int128) {0xFEDCBA9876543210};
+  src_va_u128 = src_va_u128 << 64
+    | (vector unsigned __int128) {0xFEDCBA9876543210};
+  /* The left shift amount is 3 bytes, i.e. 3 * 8 bits.  */
+  src_vb_sc = (vector signed char) {0x03<<3, 0x0, 0x0, 0x0,
+                    0x0, 0x0, 0x0, 0x0,
+                    0x0, 0x0, 0x0, 0x0,
+                    0x00, 0x00, 0x00, 0x0};
+  TEST_2ARG_SIGNED(slo, unsigned, u128, 0x9876543210FEDCBA,
+               0x9876543210000000)
+
+  src_va_u128 = (vector unsigned __int128) {0xFEDCBA9876543210};
+  src_va_u128 = src_va_u128 << 64
+    | (vector unsigned __int128) {0xFEDCBA9876543210};
+  /* The left shift amount is 4 bytes, i.e. 4 * 8 bits.  */
+  src_vb_uc = (vector unsigned char) {0x04<<3, 0x0, 0x0, 0x0,
+                      0x0, 0x0, 0x0, 0x0,
+                      0x0, 0x0, 0x0, 0x0,
+                      0x00, 0x00, 0x00, 0x0};
+  TEST_2ARG_UNSIGNED(slo, unsigned, u128, 0x76543210FEDCBA98,
+               0x7654321000000000)
+
+  /* Shift right by octect tests, vec_sro.  Shift is by immediate value
+     times 8.  Shift amount in bits 121:124.  */
+  src_va_s128 = (vector signed __int128) {0x123456789ABCDEF0};
+  src_va_s128 = (src_va_s128 << 64)
+    | (vector signed __int128) {0x123456789ABCDEF0};
+  /* Note vb_sc is Endian specific, this is just LE.  */
+  /* The left shift amount is 1 byte, i.e. 1 * 8 bits.  */
+  src_vb_sc = (vector signed char) {0x1 << 3, 0x0, 0x0, 0x0,
+                    0x0, 0x0, 0x0, 0x0,
+                    0x0, 0x0, 0x0, 0x0,
+                    0x0, 0x0, 0x0, 0x0};
+  TEST_2ARG_SIGNED(sro, signed, s128, 0x00123456789ABCDE, 
0xF0123456789ABCDE)
+
+  src_va_s128 = (vector signed __int128) {0x123456789ABCDEF0};
+  src_va_s128 = (src_va_s128 << 64)
+    | (vector signed __int128) {0x123456789ABCDEF0};
+  /* Note vb_sc is Endian specific, this is just LE.  */
+  /* The left shift amount is 1 byte, i.e. 1 * 8 bits.  */
+  src_vb_uc = (vector unsigned char) {0x2 << 3, 0x0, 0x0, 0x0,
+                      0x0, 0x0, 0x0, 0x0,
+                      0x0, 0x0, 0x0, 0x0,
+                      0x0, 0x0, 0x0, 0x0};
+  TEST_2ARG_UNSIGNED(sro, signed, s128, 0x0000123456789ABC,
+             0xDEF0123456789ABC)
+
+  src_va_u128 = (vector unsigned __int128) {0xFEDCBA9876543210};
+  src_va_u128 = src_va_u128 << 64
+    | (vector unsigned __int128) {0xFEDCBA9876543210};
+  /* The left shift amount is 4 bytes, i.e. 4 * 8 bits.  */
+  src_vb_sc = (vector signed char) {0x03<<3, 0x0, 0x0, 0x0,
+                    0x0, 0x0, 0x0, 0x0,
+                    0x0, 0x0, 0x0, 0x0,
+                    0x00, 0x00, 0x00, 0x0};
+  TEST_2ARG_SIGNED(sro, unsigned, u128, 0x000000FEDCBA9876,
+           0x543210FEDCBA9876)
+
+  src_va_u128 = (vector unsigned __int128) {0xFEDCBA9876543210};
+  src_va_u128 = src_va_u128 << 64
+    | (vector unsigned __int128) {0xFEDCBA9876543210};
+  /* The left shift amount is 4 bytes, i.e. 4 * 8 bits.  */
+  src_vb_uc = (vector unsigned char) {0x04<<3, 0x0, 0x0, 0x0,
+                      0x0, 0x0, 0x0, 0x0,
+                      0x0, 0x0, 0x0, 0x0,
+                      0x00, 0x00, 0x00, 0x0};
+  TEST_2ARG_UNSIGNED(sro, unsigned, u128, 0x00000000FEDCBA98,
+               0x76543210FEDCBA98)
+
+  /* 128-bit vector shift left tests, vec_sldw. */
+  src_va_s128 = (vector signed __int128) {0x123456789ABCDEF0};
+  src_va_s128 = (src_va_s128 << 64)
+    | (vector signed __int128) {0x123456789ABCDEF0};
+  src_vb_s128 = (vector signed __int128) {0xFEDCBA9876543210};
+  src_vb_s128 = (src_vb_s128 << 64)
+    | (vector signed __int128) {0xFEDCBA9876543210};
+  TEST_3ARG(sldw, signed, s128, 1, 0x9ABCDEF012345678, 0x9ABCDEF0FEDCBA98)
+
+  src_va_u128 = (vector unsigned __int128) {0x123456789ABCDEF0};
+  src_va_u128 = (src_va_u128 << 64)
+    | (vector unsigned __int128) {0x123456789ABCDEF0};
+  src_vb_u128 = (vector unsigned __int128) {0xFEDCBA9876543210};
+  src_vb_u128 = (src_vb_u128 << 64)
+    | (vector unsigned __int128) {0xFEDCBA9876543210};
+  TEST_3ARG(sldw, unsigned, u128, 2, 0x123456789ABCDEF0, 
0xFEDCBA9876543210)
+