diff mbox

, PR target/81593, Optimize PowerPC vector sets coming from a vector extracts

Message ID 20170727232113.GA8723@ibm-tiger.the-meissners.org
State New
Headers show

Commit Message

Michael Meissner July 27, 2017, 11:21 p.m. UTC
This patches optimizes the PowerPC vector set operation for 64-bit doubles and
longs where the elements in the vector set may have been extracted from another
vector (PR target/81593):

Here an an example:

	vector double
	test_vpasted (vector double high, vector double low)
	{
	  vector double res;
	  res[1] = high[1];
	  res[0] = low[0];
	  return res;
	}

Previously it would generate:

        xxpermdi 12,34,34,2
        vspltisw 2,0
        xxlor 0,35,35
        xxpermdi 34,34,12,0
        xxpermdi 34,0,34,1

and with these patches, it now generates:

        xxpermdi 34,35,34,1

I have tested it on a little endian power8 system and a big endian power7
system with the usual bootstrap and make checks with no regressions.  Can I
check this into the trunk?

I also built Spec 2006 with the compiler, and saw no changes in the code
generated.  This isn't surprising because it isn't something that auto
vectorization might generate by default.

[gcc]
2017-07-27  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/81593
	* config/rs6000/rs6000-protos.h (rs6000_emit_xxpermdi): New
	declaration.
	* config/rs6000/rs6000.c (rs6000_emit_xxpermdi): New function to
	emit XXPERMDI accessing either double word in either vector
	register inputs.
	* config/rs6000/vsx.md (vsx_concat_<mode>, VSX_D iterator):
	Rewrite VEC_CONCAT insn to call rs6000_emit_xxpermdi.  Simplify
	the constraints with the removal of the -mupper-regs-* switches.
	(vsx_concat_<mode>_1): New combiner insns to optimize CONCATs
	where either register might have come from VEC_SELECT.
	(vsx_concat_<mode>_2): Likewise.
	(vsx_concat_<mode>_3): Likewise.
	(vsx_set_<mode>, VSX_D iterator): Rewrite insn to generate a
	VEC_CONCAT rather than use an UNSPEC to specify the option.

[gcc/testsuite]
2017-07-27  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/81593
	* gcc.target/powerpc/vsx-extract-6.c: New test.
	* gcc.target/powerpc/vsx-extract-7.c: Likewise.

Comments

Richard Biener July 28, 2017, 7:51 a.m. UTC | #1
On Fri, Jul 28, 2017 at 1:21 AM, Michael Meissner
<meissner@linux.vnet.ibm.com> wrote:
> This patches optimizes the PowerPC vector set operation for 64-bit doubles and
> longs where the elements in the vector set may have been extracted from another
> vector (PR target/81593):
>
> Here an an example:
>
>         vector double
>         test_vpasted (vector double high, vector double low)
>         {
>           vector double res;
>           res[1] = high[1];
>           res[0] = low[0];
>           return res;
>         }

Interesting.  We expand from

  <bb 2> [100.00%] [count: INV]:
  _1 = BIT_FIELD_REF <high_4(D), 64, 64>;
  res_6 = BIT_INSERT_EXPR <res_5(D), _1, 64 (64 bits)>;
  _2 = BIT_FIELD_REF <low_7(D), 64, 0>;
  res_8 = BIT_INSERT_EXPR <res_6, _2, 0 (64 bits)>;
  return res_8;

but ideally we'd pattern-match that to a VEC_PERM_EXPR.  The bswap
pass looks like the canonical pass for this even though it's quite awkward
to fill this in.

So a match.pd rule would work as well here - your ppc backend patterns
are v2df specific, right?

> Previously it would generate:
>
>         xxpermdi 12,34,34,2
>         vspltisw 2,0
>         xxlor 0,35,35
>         xxpermdi 34,34,12,0
>         xxpermdi 34,0,34,1
>
> and with these patches, it now generates:
>
>         xxpermdi 34,35,34,1
>
> I have tested it on a little endian power8 system and a big endian power7
> system with the usual bootstrap and make checks with no regressions.  Can I
> check this into the trunk?
>
> I also built Spec 2006 with the compiler, and saw no changes in the code
> generated.  This isn't surprising because it isn't something that auto
> vectorization might generate by default.
>
> [gcc]
> 2017-07-27  Michael Meissner  <meissner@linux.vnet.ibm.com>
>
>         PR target/81593
>         * config/rs6000/rs6000-protos.h (rs6000_emit_xxpermdi): New
>         declaration.
>         * config/rs6000/rs6000.c (rs6000_emit_xxpermdi): New function to
>         emit XXPERMDI accessing either double word in either vector
>         register inputs.
>         * config/rs6000/vsx.md (vsx_concat_<mode>, VSX_D iterator):
>         Rewrite VEC_CONCAT insn to call rs6000_emit_xxpermdi.  Simplify
>         the constraints with the removal of the -mupper-regs-* switches.
>         (vsx_concat_<mode>_1): New combiner insns to optimize CONCATs
>         where either register might have come from VEC_SELECT.
>         (vsx_concat_<mode>_2): Likewise.
>         (vsx_concat_<mode>_3): Likewise.
>         (vsx_set_<mode>, VSX_D iterator): Rewrite insn to generate a
>         VEC_CONCAT rather than use an UNSPEC to specify the option.
>
> [gcc/testsuite]
> 2017-07-27  Michael Meissner  <meissner@linux.vnet.ibm.com>
>
>         PR target/81593
>         * gcc.target/powerpc/vsx-extract-6.c: New test.
>         * gcc.target/powerpc/vsx-extract-7.c: Likewise.
>
> --
> Michael Meissner, IBM
> IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
> email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Andrew Pinski July 28, 2017, 8:02 a.m. UTC | #2
On Fri, Jul 28, 2017 at 12:51 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Fri, Jul 28, 2017 at 1:21 AM, Michael Meissner
> <meissner@linux.vnet.ibm.com> wrote:
>> This patches optimizes the PowerPC vector set operation for 64-bit doubles and
>> longs where the elements in the vector set may have been extracted from another
>> vector (PR target/81593):
>>
>> Here an an example:
>>
>>         vector double
>>         test_vpasted (vector double high, vector double low)
>>         {
>>           vector double res;
>>           res[1] = high[1];
>>           res[0] = low[0];
>>           return res;
>>         }
>
> Interesting.  We expand from
>
>   <bb 2> [100.00%] [count: INV]:
>   _1 = BIT_FIELD_REF <high_4(D), 64, 64>;
>   res_6 = BIT_INSERT_EXPR <res_5(D), _1, 64 (64 bits)>;
>   _2 = BIT_FIELD_REF <low_7(D), 64, 0>;
>   res_8 = BIT_INSERT_EXPR <res_6, _2, 0 (64 bits)>;
>   return res_8;
>
> but ideally we'd pattern-match that to a VEC_PERM_EXPR.  The bswap
> pass looks like the canonical pass for this even though it's quite awkward
> to fill this in.

I was thinking about this exactly.  Though for the scale use of
BIT_INSERT_EXPR/BIT_FIELD_REF.
I have a case where someone writes (this shows up in GCC too):
a->b = c->b;
a->d = c->d;
a->e = c->e;
a->f = c->f;
a->g = c->g;
a->h = c->h;

Where b,d,e,f,g,h are adjacent bit-fields after I lowered the bit-fields I have:
_1 = BIT_FIELD_REF <a.1_3, 2, 0>;
_8 = BIT_INSERT_EXPR <c.1_4, _1, 0 (2 bits)>;
_2 = BIT_FIELD_REF <a.1_3, 4, 2>;
_9 = BIT_INSERT_EXPR <_8, _2, 2 (4 bits)>;
....

For the vector case, can't we write it as:
_1 = BIT_FIELD_REF <high_4(D), 64, 64>;
_2 = BIT_FIELD_REF <low_7(D), 64, 0>;
res_8 = {_1, _2};

And then have some match.pd patterns (which might get complex), to
rewrite that into VEC_PERM_EXPR?
The reason why I ask that is because say someone who wrote:
vector double
test_vpasted (vector double high, vector double low)
{
  vector double res = { high[1], low[0] };
  return res;
}


Thanks,
Andrew Pinski

>
> So a match.pd rule would work as well here - your ppc backend patterns
> are v2df specific, right?
>
>> Previously it would generate:
>>
>>         xxpermdi 12,34,34,2
>>         vspltisw 2,0
>>         xxlor 0,35,35
>>         xxpermdi 34,34,12,0
>>         xxpermdi 34,0,34,1
>>
>> and with these patches, it now generates:
>>
>>         xxpermdi 34,35,34,1
>>
>> I have tested it on a little endian power8 system and a big endian power7
>> system with the usual bootstrap and make checks with no regressions.  Can I
>> check this into the trunk?
>>
>> I also built Spec 2006 with the compiler, and saw no changes in the code
>> generated.  This isn't surprising because it isn't something that auto
>> vectorization might generate by default.
>>
>> [gcc]
>> 2017-07-27  Michael Meissner  <meissner@linux.vnet.ibm.com>
>>
>>         PR target/81593
>>         * config/rs6000/rs6000-protos.h (rs6000_emit_xxpermdi): New
>>         declaration.
>>         * config/rs6000/rs6000.c (rs6000_emit_xxpermdi): New function to
>>         emit XXPERMDI accessing either double word in either vector
>>         register inputs.
>>         * config/rs6000/vsx.md (vsx_concat_<mode>, VSX_D iterator):
>>         Rewrite VEC_CONCAT insn to call rs6000_emit_xxpermdi.  Simplify
>>         the constraints with the removal of the -mupper-regs-* switches.
>>         (vsx_concat_<mode>_1): New combiner insns to optimize CONCATs
>>         where either register might have come from VEC_SELECT.
>>         (vsx_concat_<mode>_2): Likewise.
>>         (vsx_concat_<mode>_3): Likewise.
>>         (vsx_set_<mode>, VSX_D iterator): Rewrite insn to generate a
>>         VEC_CONCAT rather than use an UNSPEC to specify the option.
>>
>> [gcc/testsuite]
>> 2017-07-27  Michael Meissner  <meissner@linux.vnet.ibm.com>
>>
>>         PR target/81593
>>         * gcc.target/powerpc/vsx-extract-6.c: New test.
>>         * gcc.target/powerpc/vsx-extract-7.c: Likewise.
>>
>> --
>> Michael Meissner, IBM
>> IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
>> email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Richard Biener July 28, 2017, 8:20 a.m. UTC | #3
On Fri, Jul 28, 2017 at 10:02 AM, Andrew Pinski <pinskia@gmail.com> wrote:
> On Fri, Jul 28, 2017 at 12:51 AM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Fri, Jul 28, 2017 at 1:21 AM, Michael Meissner
>> <meissner@linux.vnet.ibm.com> wrote:
>>> This patches optimizes the PowerPC vector set operation for 64-bit doubles and
>>> longs where the elements in the vector set may have been extracted from another
>>> vector (PR target/81593):
>>>
>>> Here an an example:
>>>
>>>         vector double
>>>         test_vpasted (vector double high, vector double low)
>>>         {
>>>           vector double res;
>>>           res[1] = high[1];
>>>           res[0] = low[0];
>>>           return res;
>>>         }
>>
>> Interesting.  We expand from
>>
>>   <bb 2> [100.00%] [count: INV]:
>>   _1 = BIT_FIELD_REF <high_4(D), 64, 64>;
>>   res_6 = BIT_INSERT_EXPR <res_5(D), _1, 64 (64 bits)>;
>>   _2 = BIT_FIELD_REF <low_7(D), 64, 0>;
>>   res_8 = BIT_INSERT_EXPR <res_6, _2, 0 (64 bits)>;
>>   return res_8;
>>
>> but ideally we'd pattern-match that to a VEC_PERM_EXPR.  The bswap
>> pass looks like the canonical pass for this even though it's quite awkward
>> to fill this in.
>
> I was thinking about this exactly.  Though for the scale use of
> BIT_INSERT_EXPR/BIT_FIELD_REF.
> I have a case where someone writes (this shows up in GCC too):
> a->b = c->b;
> a->d = c->d;
> a->e = c->e;
> a->f = c->f;
> a->g = c->g;
> a->h = c->h;
>
> Where b,d,e,f,g,h are adjacent bit-fields after I lowered the bit-fields I have:
> _1 = BIT_FIELD_REF <a.1_3, 2, 0>;
> _8 = BIT_INSERT_EXPR <c.1_4, _1, 0 (2 bits)>;
> _2 = BIT_FIELD_REF <a.1_3, 4, 2>;
> _9 = BIT_INSERT_EXPR <_8, _2, 2 (4 bits)>;
> ....
>
> For the vector case, can't we write it as:
> _1 = BIT_FIELD_REF <high_4(D), 64, 64>;
> _2 = BIT_FIELD_REF <low_7(D), 64, 0>;
> res_8 = {_1, _2};
>
> And then have some match.pd patterns (which might get complex), to
> rewrite that into VEC_PERM_EXPR?
> The reason why I ask that is because say someone who wrote:
> vector double
> test_vpasted (vector double high, vector double low)
> {
>   vector double res = { high[1], low[0] };
>   return res;
> }

I still believe a proper pass is better than match.pd patterns (which are
awkward when dealing with "variable operand number" cases).

I believe in the end we want to "unify" SRA, bswap and store-merging
at least.  Analyze memory/component accesses, their flow and then
pattern-match the result.  bswap is good with the flow stuff but its
memory/component access analysis is too ad-hoc.

"unify" in the sense of using common infrastructure.

Richard.

>
> Thanks,
> Andrew Pinski
>
>>
>> So a match.pd rule would work as well here - your ppc backend patterns
>> are v2df specific, right?
>>
>>> Previously it would generate:
>>>
>>>         xxpermdi 12,34,34,2
>>>         vspltisw 2,0
>>>         xxlor 0,35,35
>>>         xxpermdi 34,34,12,0
>>>         xxpermdi 34,0,34,1
>>>
>>> and with these patches, it now generates:
>>>
>>>         xxpermdi 34,35,34,1
>>>
>>> I have tested it on a little endian power8 system and a big endian power7
>>> system with the usual bootstrap and make checks with no regressions.  Can I
>>> check this into the trunk?
>>>
>>> I also built Spec 2006 with the compiler, and saw no changes in the code
>>> generated.  This isn't surprising because it isn't something that auto
>>> vectorization might generate by default.
>>>
>>> [gcc]
>>> 2017-07-27  Michael Meissner  <meissner@linux.vnet.ibm.com>
>>>
>>>         PR target/81593
>>>         * config/rs6000/rs6000-protos.h (rs6000_emit_xxpermdi): New
>>>         declaration.
>>>         * config/rs6000/rs6000.c (rs6000_emit_xxpermdi): New function to
>>>         emit XXPERMDI accessing either double word in either vector
>>>         register inputs.
>>>         * config/rs6000/vsx.md (vsx_concat_<mode>, VSX_D iterator):
>>>         Rewrite VEC_CONCAT insn to call rs6000_emit_xxpermdi.  Simplify
>>>         the constraints with the removal of the -mupper-regs-* switches.
>>>         (vsx_concat_<mode>_1): New combiner insns to optimize CONCATs
>>>         where either register might have come from VEC_SELECT.
>>>         (vsx_concat_<mode>_2): Likewise.
>>>         (vsx_concat_<mode>_3): Likewise.
>>>         (vsx_set_<mode>, VSX_D iterator): Rewrite insn to generate a
>>>         VEC_CONCAT rather than use an UNSPEC to specify the option.
>>>
>>> [gcc/testsuite]
>>> 2017-07-27  Michael Meissner  <meissner@linux.vnet.ibm.com>
>>>
>>>         PR target/81593
>>>         * gcc.target/powerpc/vsx-extract-6.c: New test.
>>>         * gcc.target/powerpc/vsx-extract-7.c: Likewise.
>>>
>>> --
>>> Michael Meissner, IBM
>>> IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
>>> email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Marc Glisse July 28, 2017, 8:37 a.m. UTC | #4
On Fri, 28 Jul 2017, Andrew Pinski wrote:

> For the vector case, can't we write it as:
> _1 = BIT_FIELD_REF <high_4(D), 64, 64>;
> _2 = BIT_FIELD_REF <low_7(D), 64, 0>;
> res_8 = {_1, _2};
>
> And then have some match.pd patterns (which might get complex), to
> rewrite that into VEC_PERM_EXPR?

For this last part, we have simplify_vector_constructor in 
tree-ssa-forwprop.c, which currently only recognizes VEC_PERM_EXPR of a 
single vector, but I guess it could be extended to 2 vectors. Not as good 
as a bswap revamp (which will be needed anyway at some point), but less 
work.
Michael Meissner July 28, 2017, 2:44 p.m. UTC | #5
On Fri, Jul 28, 2017 at 09:51:30AM +0200, Richard Biener wrote:
> On Fri, Jul 28, 2017 at 1:21 AM, Michael Meissner
> <meissner@linux.vnet.ibm.com> wrote:
> > This patches optimizes the PowerPC vector set operation for 64-bit doubles and
> > longs where the elements in the vector set may have been extracted from another
> > vector (PR target/81593):
> >
> > Here an an example:
> >
> >         vector double
> >         test_vpasted (vector double high, vector double low)
> >         {
> >           vector double res;
> >           res[1] = high[1];
> >           res[0] = low[0];
> >           return res;
> >         }
> 
> Interesting.  We expand from
> 
>   <bb 2> [100.00%] [count: INV]:
>   _1 = BIT_FIELD_REF <high_4(D), 64, 64>;
>   res_6 = BIT_INSERT_EXPR <res_5(D), _1, 64 (64 bits)>;
>   _2 = BIT_FIELD_REF <low_7(D), 64, 0>;
>   res_8 = BIT_INSERT_EXPR <res_6, _2, 0 (64 bits)>;
>   return res_8;
> 
> but ideally we'd pattern-match that to a VEC_PERM_EXPR.  The bswap
> pass looks like the canonical pass for this even though it's quite awkward
> to fill this in.
> 
> So a match.pd rule would work as well here - your ppc backend patterns
> are v2df specific, right?

Both V2DF and V2DI.

While it would be great to have a machine independent optimization, my patches
would also work for PowerPC specific built-ins for vector extract and vector
insert.

Also my patches replaces an UNSPEC to create the vector with VEC_CONCAT.

Thus work going on in for machine independent support should not preclude
this patch from being accepted in the PowerPC backend.
Segher Boessenkool July 28, 2017, 9:08 p.m. UTC | #6
Hi!

On Thu, Jul 27, 2017 at 07:21:14PM -0400, Michael Meissner wrote:
> This patches optimizes the PowerPC vector set operation for 64-bit doubles and
> longs where the elements in the vector set may have been extracted from another
> vector (PR target/81593):

> 	* config/rs6000/rs6000.c (rs6000_emit_xxpermdi): New function to
> 	emit XXPERMDI accessing either double word in either vector
> 	register inputs.

"emit" is not a good name for this: that is generally used for something
that does emit_insn, i.e. put an insn in the instruction stream.  This
function returns a string a define_insn can return.  For the rl* insns
I called the similar functions rs6000_insn_for_*, maybe something like
that is better here?

> +/* Emit a XXPERMDI instruction that can extract from either double word of the
> +   two arguments.  ELEMENT1 and ELEMENT2 are either NULL or they are 0/1 giving
> +   which double word to be used for the operand.  */
> +
> +const char *
> +rs6000_emit_xxpermdi (rtx operands[], rtx element1, rtx element2)
> +{
> +  int op1_dword = (!element1) ? 0 : INTVAL (element1);
> +  int op2_dword = (!element2) ? 0 : INTVAL (element2);
> +
> +  gcc_assert (IN_RANGE (op1_dword | op2_dword, 0, 1));
> +
> +  if (BYTES_BIG_ENDIAN)
> +    {
> +      operands[3] = GEN_INT (2*op1_dword + op2_dword);
> +      return "xxpermdi %x0,%x1,%x2,%3";
> +    }
> +  else
> +    {
> +      if (element1)
> +	op1_dword = 1 - op1_dword;
> +
> +      if (element2)
> +	op2_dword = 1 - op2_dword;
> +
> +      operands[3] = GEN_INT (op1_dword + 2*op2_dword);
> +      return "xxpermdi %x0,%x2,%x1,%3";
> +    }
> +}

I think calling this with the rtx elementN args makes this only more
complicated (the function comment doesn't say what they are or what
NULL means, btw).

>  (define_insn "vsx_concat_<mode>"
> -  [(set (match_operand:VSX_D 0 "gpc_reg_operand" "=<VSa>,we")
> +  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa,we")
>  	(vec_concat:VSX_D
> -	 (match_operand:<VS_scalar> 1 "gpc_reg_operand" "<VS_64reg>,b")
> -	 (match_operand:<VS_scalar> 2 "gpc_reg_operand" "<VS_64reg>,b")))]
> +	 (match_operand:<VS_scalar> 1 "gpc_reg_operand" "wa,b")
> +	 (match_operand:<VS_scalar> 2 "gpc_reg_operand" "wa,b")))]
>    "VECTOR_MEM_VSX_P (<MODE>mode)"
>  {
>    if (which_alternative == 0)
> -    return (BYTES_BIG_ENDIAN
> -	    ? "xxpermdi %x0,%x1,%x2,0"
> -	    : "xxpermdi %x0,%x2,%x1,0");
> +    return rs6000_emit_xxpermdi (operands, NULL_RTX, NULL_RTX);
>  
>    else if (which_alternative == 1)
> -    return (BYTES_BIG_ENDIAN
> +    return (VECTOR_ELT_ORDER_BIG
>  	    ? "mtvsrdd %x0,%1,%2"
>  	    : "mtvsrdd %x0,%2,%1");

This one could be

{
  if (!BYTES_BIG_ENDIAN)
    std::swap (operands[1], operands[2]);

  switch (which_alternative)
    {
    case 0:
      return "xxpermdi %x0,%x1,%x2,0";
    case 1:
      return "mtvsrdd %x0,%1,%2";
    default:
      gcc_unreachable ();
    }
}

(Could/should we use xxmrghd instead?  Do all supported assemblers know
that extended mnemonic, is it actually more readable?)

> --- gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c	(svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c)	(revision 0)
> +++ gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c	(.../gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c)	(revision 250640)
> @@ -0,0 +1,15 @@
> +/* { dg-do compile { target { powerpc*-*-* } } } */
> +/* { dg-skip-if "" { powerpc*-*-darwin* } } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-options "-O2 -mvsx" } */
> +
> +vector double
> +test_vpasted (vector double high, vector double low)
> +{
> +  vector double res;
> +  res[1] = high[1];
> +  res[0] = low[0];
> +  return res;
> +}
> +
> +/* { dg-final { scan-assembler-times {\mxxpermdi\M} 1 } } */

In this and the other testcase, should you test no other insns at all
are generated?


Segher
Bill Schmidt July 30, 2017, 2 p.m. UTC | #7
> On Jul 28, 2017, at 4:08 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> Hi!
> 
> On Thu, Jul 27, 2017 at 07:21:14PM -0400, Michael Meissner wrote:
>> This patches optimizes the PowerPC vector set operation for 64-bit doubles and
>> longs where the elements in the vector set may have been extracted from another
>> vector (PR target/81593):
> 
>> 	* config/rs6000/rs6000.c (rs6000_emit_xxpermdi): New function to
>> 	emit XXPERMDI accessing either double word in either vector
>> 	register inputs.
> 
> "emit" is not a good name for this: that is generally used for something
> that does emit_insn, i.e. put an insn in the instruction stream.  This
> function returns a string a define_insn can return.  For the rl* insns
> I called the similar functions rs6000_insn_for_*, maybe something like
> that is better here?
> 
>> +/* Emit a XXPERMDI instruction that can extract from either double word of the
>> +   two arguments.  ELEMENT1 and ELEMENT2 are either NULL or they are 0/1 giving
>> +   which double word to be used for the operand.  */
>> +
>> +const char *
>> +rs6000_emit_xxpermdi (rtx operands[], rtx element1, rtx element2)
>> +{
>> +  int op1_dword = (!element1) ? 0 : INTVAL (element1);
>> +  int op2_dword = (!element2) ? 0 : INTVAL (element2);
>> +
>> +  gcc_assert (IN_RANGE (op1_dword | op2_dword, 0, 1));
>> +
>> +  if (BYTES_BIG_ENDIAN)
>> +    {
>> +      operands[3] = GEN_INT (2*op1_dword + op2_dword);
>> +      return "xxpermdi %x0,%x1,%x2,%3";
>> +    }
>> +  else
>> +    {
>> +      if (element1)
>> +	op1_dword = 1 - op1_dword;
>> +
>> +      if (element2)
>> +	op2_dword = 1 - op2_dword;
>> +
>> +      operands[3] = GEN_INT (op1_dword + 2*op2_dword);
>> +      return "xxpermdi %x0,%x2,%x1,%3";
>> +    }
>> +}
> 
> I think calling this with the rtx elementN args makes this only more
> complicated (the function comment doesn't say what they are or what
> NULL means, btw).
> 
>> (define_insn "vsx_concat_<mode>"
>> -  [(set (match_operand:VSX_D 0 "gpc_reg_operand" "=<VSa>,we")
>> +  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa,we")
>> 	(vec_concat:VSX_D
>> -	 (match_operand:<VS_scalar> 1 "gpc_reg_operand" "<VS_64reg>,b")
>> -	 (match_operand:<VS_scalar> 2 "gpc_reg_operand" "<VS_64reg>,b")))]
>> +	 (match_operand:<VS_scalar> 1 "gpc_reg_operand" "wa,b")
>> +	 (match_operand:<VS_scalar> 2 "gpc_reg_operand" "wa,b")))]
>>   "VECTOR_MEM_VSX_P (<MODE>mode)"
>> {
>>   if (which_alternative == 0)
>> -    return (BYTES_BIG_ENDIAN
>> -	    ? "xxpermdi %x0,%x1,%x2,0"
>> -	    : "xxpermdi %x0,%x2,%x1,0");
>> +    return rs6000_emit_xxpermdi (operands, NULL_RTX, NULL_RTX);
>> 
>>   else if (which_alternative == 1)
>> -    return (BYTES_BIG_ENDIAN
>> +    return (VECTOR_ELT_ORDER_BIG
>> 	    ? "mtvsrdd %x0,%1,%2"
>> 	    : "mtvsrdd %x0,%2,%1");
> 
> This one could be
> 
> {
>  if (!BYTES_BIG_ENDIAN)

!VECTOR_ELT_ORDER_BIG (to accommodate -maltivec=be).  (We have some general bitrot associated with -maltivec=be that needs to be addressed, or we need to give up on it altogether.  Still of two minds about this.)

Bill

>    std::swap (operands[1], operands[2]);
> 
>  switch (which_alternative)
>    {
>    case 0:
>      return "xxpermdi %x0,%x1,%x2,0";
>    case 1:
>      return "mtvsrdd %x0,%1,%2";
>    default:
>      gcc_unreachable ();
>    }
> }
> 
> (Could/should we use xxmrghd instead?  Do all supported assemblers know
> that extended mnemonic, is it actually more readable?)
> 
>> --- gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c	(svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c)	(revision 0)
>> +++ gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c	(.../gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c)	(revision 250640)
>> @@ -0,0 +1,15 @@
>> +/* { dg-do compile { target { powerpc*-*-* } } } */
>> +/* { dg-skip-if "" { powerpc*-*-darwin* } } */
>> +/* { dg-require-effective-target powerpc_vsx_ok } */
>> +/* { dg-options "-O2 -mvsx" } */
>> +
>> +vector double
>> +test_vpasted (vector double high, vector double low)
>> +{
>> +  vector double res;
>> +  res[1] = high[1];
>> +  res[0] = low[0];
>> +  return res;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {\mxxpermdi\M} 1 } } */
> 
> In this and the other testcase, should you test no other insns at all
> are generated?
> 
> 
> Segher
>
Michael Meissner July 31, 2017, 5:38 p.m. UTC | #8
On Fri, Jul 28, 2017 at 04:08:50PM -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Jul 27, 2017 at 07:21:14PM -0400, Michael Meissner wrote:
> > This patches optimizes the PowerPC vector set operation for 64-bit doubles and
> > longs where the elements in the vector set may have been extracted from another
> > vector (PR target/81593):
> 
> > 	* config/rs6000/rs6000.c (rs6000_emit_xxpermdi): New function to
> > 	emit XXPERMDI accessing either double word in either vector
> > 	register inputs.
> 
> "emit" is not a good name for this: that is generally used for something
> that does emit_insn, i.e. put an insn in the instruction stream.  This
> function returns a string a define_insn can return.  For the rl* insns
> I called the similar functions rs6000_insn_for_*, maybe something like
> that is better here?

Yeah, I should have used rs6000_output_xxpermdi or similar (or output_xxpermdi,
etc.), which is what the other functions used.

> > +/* Emit a XXPERMDI instruction that can extract from either double word of the
> > +   two arguments.  ELEMENT1 and ELEMENT2 are either NULL or they are 0/1 giving
> > +   which double word to be used for the operand.  */
> > +
> > +const char *
> > +rs6000_emit_xxpermdi (rtx operands[], rtx element1, rtx element2)
> > +{
> > +  int op1_dword = (!element1) ? 0 : INTVAL (element1);
> > +  int op2_dword = (!element2) ? 0 : INTVAL (element2);
> > +
> > +  gcc_assert (IN_RANGE (op1_dword | op2_dword, 0, 1));
> > +
> > +  if (BYTES_BIG_ENDIAN)
> > +    {
> > +      operands[3] = GEN_INT (2*op1_dword + op2_dword);
> > +      return "xxpermdi %x0,%x1,%x2,%3";
> > +    }
> > +  else
> > +    {
> > +      if (element1)
> > +	op1_dword = 1 - op1_dword;
> > +
> > +      if (element2)
> > +	op2_dword = 1 - op2_dword;
> > +
> > +      operands[3] = GEN_INT (op1_dword + 2*op2_dword);
> > +      return "xxpermdi %x0,%x2,%x1,%3";
> > +    }
> > +}
> 
> I think calling this with the rtx elementN args makes this only more
> complicated (the function comment doesn't say what they are or what
> NULL means, btw).

Ok, let me think on it.

> 
> >  (define_insn "vsx_concat_<mode>"
> > -  [(set (match_operand:VSX_D 0 "gpc_reg_operand" "=<VSa>,we")
> > +  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa,we")
> >  	(vec_concat:VSX_D
> > -	 (match_operand:<VS_scalar> 1 "gpc_reg_operand" "<VS_64reg>,b")
> > -	 (match_operand:<VS_scalar> 2 "gpc_reg_operand" "<VS_64reg>,b")))]
> > +	 (match_operand:<VS_scalar> 1 "gpc_reg_operand" "wa,b")
> > +	 (match_operand:<VS_scalar> 2 "gpc_reg_operand" "wa,b")))]
> >    "VECTOR_MEM_VSX_P (<MODE>mode)"
> >  {
> >    if (which_alternative == 0)
> > -    return (BYTES_BIG_ENDIAN
> > -	    ? "xxpermdi %x0,%x1,%x2,0"
> > -	    : "xxpermdi %x0,%x2,%x1,0");
> > +    return rs6000_emit_xxpermdi (operands, NULL_RTX, NULL_RTX);
> >  
> >    else if (which_alternative == 1)
> > -    return (BYTES_BIG_ENDIAN
> > +    return (VECTOR_ELT_ORDER_BIG
> >  	    ? "mtvsrdd %x0,%1,%2"
> >  	    : "mtvsrdd %x0,%2,%1");
> 
> This one could be
> 
> {
>   if (!BYTES_BIG_ENDIAN)
>     std::swap (operands[1], operands[2]);
> 
>   switch (which_alternative)
>     {
>     case 0:
>       return "xxpermdi %x0,%x1,%x2,0";
>     case 1:
>       return "mtvsrdd %x0,%1,%2";
>     default:
>       gcc_unreachable ();
>     }
> }

> (Could/should we use xxmrghd instead?  Do all supported assemblers know
> that extended mnemonic, is it actually more readable?)

For me no, xxpermdi is clearer.  But if you want xxmrghd, I can do it.

> > --- gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c	(svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c)	(revision 0)
> > +++ gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c	(.../gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c)	(revision 250640)
> > @@ -0,0 +1,15 @@
> > +/* { dg-do compile { target { powerpc*-*-* } } } */
> > +/* { dg-skip-if "" { powerpc*-*-darwin* } } */
> > +/* { dg-require-effective-target powerpc_vsx_ok } */
> > +/* { dg-options "-O2 -mvsx" } */
> > +
> > +vector double
> > +test_vpasted (vector double high, vector double low)
> > +{
> > +  vector double res;
> > +  res[1] = high[1];
> > +  res[0] = low[0];
> > +  return res;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {\mxxpermdi\M} 1 } } */
> 
> In this and the other testcase, should you test no other insns at all
> are generated?

It is kind of hard to test a negative, without trying to guess what possible
instructions could be generated.
Michael Meissner July 31, 2017, 5:40 p.m. UTC | #9
On Sun, Jul 30, 2017 at 09:00:58AM -0500, Bill Schmidt wrote:
> >> (define_insn "vsx_concat_<mode>"
> >> -  [(set (match_operand:VSX_D 0 "gpc_reg_operand" "=<VSa>,we")
> >> +  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa,we")
> >> 	(vec_concat:VSX_D
> >> -	 (match_operand:<VS_scalar> 1 "gpc_reg_operand" "<VS_64reg>,b")
> >> -	 (match_operand:<VS_scalar> 2 "gpc_reg_operand" "<VS_64reg>,b")))]
> >> +	 (match_operand:<VS_scalar> 1 "gpc_reg_operand" "wa,b")
> >> +	 (match_operand:<VS_scalar> 2 "gpc_reg_operand" "wa,b")))]
> >>   "VECTOR_MEM_VSX_P (<MODE>mode)"
> >> {
> >>   if (which_alternative == 0)
> >> -    return (BYTES_BIG_ENDIAN
> >> -	    ? "xxpermdi %x0,%x1,%x2,0"
> >> -	    : "xxpermdi %x0,%x2,%x1,0");
> >> +    return rs6000_emit_xxpermdi (operands, NULL_RTX, NULL_RTX);
> >> 
> >>   else if (which_alternative == 1)
> >> -    return (BYTES_BIG_ENDIAN
> >> +    return (VECTOR_ELT_ORDER_BIG
> >> 	    ? "mtvsrdd %x0,%1,%2"
> >> 	    : "mtvsrdd %x0,%2,%1");
> > 
> > This one could be
> > 
> > {
> >  if (!BYTES_BIG_ENDIAN)
> 
> !VECTOR_ELT_ORDER_BIG (to accommodate -maltivec=be).  (We have some general bitrot associated with -maltivec=be that needs to be addressed, or we need to give up on it altogether.  Still of two minds about this.)
> 
> Bill

In this case, I believe I tested -maltivec=be, and BYTES_BIG_ENDIAN is correct
(I originally had it using VECTOR_ELT_ORDER_BIG and got failures).  But I need
to look at it again.
Bill Schmidt July 31, 2017, 5:42 p.m. UTC | #10
> On Jul 31, 2017, at 12:40 PM, Michael Meissner <meissner@linux.vnet.ibm.com> wrote:
> 
> On Sun, Jul 30, 2017 at 09:00:58AM -0500, Bill Schmidt wrote:
>>>> (define_insn "vsx_concat_<mode>"
>>>> -  [(set (match_operand:VSX_D 0 "gpc_reg_operand" "=<VSa>,we")
>>>> +  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa,we")
>>>> 	(vec_concat:VSX_D
>>>> -	 (match_operand:<VS_scalar> 1 "gpc_reg_operand" "<VS_64reg>,b")
>>>> -	 (match_operand:<VS_scalar> 2 "gpc_reg_operand" "<VS_64reg>,b")))]
>>>> +	 (match_operand:<VS_scalar> 1 "gpc_reg_operand" "wa,b")
>>>> +	 (match_operand:<VS_scalar> 2 "gpc_reg_operand" "wa,b")))]
>>>>  "VECTOR_MEM_VSX_P (<MODE>mode)"
>>>> {
>>>>  if (which_alternative == 0)
>>>> -    return (BYTES_BIG_ENDIAN
>>>> -	    ? "xxpermdi %x0,%x1,%x2,0"
>>>> -	    : "xxpermdi %x0,%x2,%x1,0");
>>>> +    return rs6000_emit_xxpermdi (operands, NULL_RTX, NULL_RTX);
>>>> 
>>>>  else if (which_alternative == 1)
>>>> -    return (BYTES_BIG_ENDIAN
>>>> +    return (VECTOR_ELT_ORDER_BIG
>>>> 	    ? "mtvsrdd %x0,%1,%2"
>>>> 	    : "mtvsrdd %x0,%2,%1");
>>> 
>>> This one could be
>>> 
>>> {
>>> if (!BYTES_BIG_ENDIAN)
>> 
>> !VECTOR_ELT_ORDER_BIG (to accommodate -maltivec=be).  (We have some general bitrot associated with -maltivec=be that needs to be addressed, or we need to give up on it altogether.  Still of two minds about this.)
>> 
>> Bill
> 
> In this case, I believe I tested -maltivec=be, and BYTES_BIG_ENDIAN is correct
> (I originally had it using VECTOR_ELT_ORDER_BIG and got failures).  But I need
> to look at it again.

Hi Mike,

You misunderstand me, I think you had it right (you did move to VECTOR_ELT_ORDER_BIG here)
but I just wanted to clarify that Segher's suggestion would also need to use VECTOR_ELT_ORDER_BIG.

Thanks,
Bill

> 
> -- 
> Michael Meissner, IBM
> IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
> email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797
diff mbox

Patch

Index: gcc/config/rs6000/rs6000-protos.h
===================================================================
--- gcc/config/rs6000/rs6000-protos.h	(svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000/rs6000-protos.h)	(revision 250577)
+++ gcc/config/rs6000/rs6000-protos.h	(.../gcc/config/rs6000/rs6000-protos.h)	(working copy)
@@ -233,6 +233,7 @@  extern void rs6000_asm_output_dwarf_pcre
 					   const char *label);
 extern void rs6000_asm_output_dwarf_datarel (FILE *file, int size,
 					     const char *label);
+extern const char *rs6000_emit_xxpermdi (rtx[], rtx, rtx);
 
 /* Declare functions in rs6000-c.c */
 
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000/rs6000.c)	(revision 250577)
+++ gcc/config/rs6000/rs6000.c	(.../gcc/config/rs6000/rs6000.c)	(working copy)
@@ -39167,6 +39167,38 @@  rs6000_optab_supported_p (int op, machin
       return true;
     }
 }
+
+
+/* Emit a XXPERMDI instruction that can extract from either double word of the
+   two arguments.  ELEMENT1 and ELEMENT2 are either NULL or they are 0/1 giving
+   which double word to be used for the operand.  */
+
+const char *
+rs6000_emit_xxpermdi (rtx operands[], rtx element1, rtx element2)
+{
+  int op1_dword = (!element1) ? 0 : INTVAL (element1);
+  int op2_dword = (!element2) ? 0 : INTVAL (element2);
+
+  gcc_assert (IN_RANGE (op1_dword | op2_dword, 0, 1));
+
+  if (BYTES_BIG_ENDIAN)
+    {
+      operands[3] = GEN_INT (2*op1_dword + op2_dword);
+      return "xxpermdi %x0,%x1,%x2,%3";
+    }
+  else
+    {
+      if (element1)
+	op1_dword = 1 - op1_dword;
+
+      if (element2)
+	op2_dword = 1 - op2_dword;
+
+      operands[3] = GEN_INT (op1_dword + 2*op2_dword);
+      return "xxpermdi %x0,%x2,%x1,%3";
+    }
+}
+
 
 struct gcc_target targetm = TARGET_INITIALIZER;
 
Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md	(svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000/vsx.md)	(revision 250577)
+++ gcc/config/rs6000/vsx.md	(.../gcc/config/rs6000/vsx.md)	(working copy)
@@ -2366,19 +2366,17 @@  (define_insn "*vsx_float_fix_v2df2"
 
 ;; Build a V2DF/V2DI vector from two scalars
 (define_insn "vsx_concat_<mode>"
-  [(set (match_operand:VSX_D 0 "gpc_reg_operand" "=<VSa>,we")
+  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa,we")
 	(vec_concat:VSX_D
-	 (match_operand:<VS_scalar> 1 "gpc_reg_operand" "<VS_64reg>,b")
-	 (match_operand:<VS_scalar> 2 "gpc_reg_operand" "<VS_64reg>,b")))]
+	 (match_operand:<VS_scalar> 1 "gpc_reg_operand" "wa,b")
+	 (match_operand:<VS_scalar> 2 "gpc_reg_operand" "wa,b")))]
   "VECTOR_MEM_VSX_P (<MODE>mode)"
 {
   if (which_alternative == 0)
-    return (BYTES_BIG_ENDIAN
-	    ? "xxpermdi %x0,%x1,%x2,0"
-	    : "xxpermdi %x0,%x2,%x1,0");
+    return rs6000_emit_xxpermdi (operands, NULL_RTX, NULL_RTX);
 
   else if (which_alternative == 1)
-    return (BYTES_BIG_ENDIAN
+    return (VECTOR_ELT_ORDER_BIG
 	    ? "mtvsrdd %x0,%1,%2"
 	    : "mtvsrdd %x0,%2,%1");
 
@@ -2387,6 +2385,47 @@  (define_insn "vsx_concat_<mode>"
 }
   [(set_attr "type" "vecperm")])
 
+;; Combiner patterns to allow creating XXPERMDI's to access either double
+;; register in a vector register.  Note, rs6000_emit_xxpermdi expects
+;; operands[0..2] to be the vector registers.
+(define_insn "*vsx_concat_<mode>_1"
+  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
+	(vec_concat:VSX_D
+	 (vec_select:<VS_scalar>
+	  (match_operand:VSX_D 1 "gpc_reg_operand" "wa")
+	  (parallel [(match_operand:QI 3 "const_0_to_1_operand" "n")]))
+	 (match_operand:<VS_scalar> 2 "gpc_reg_operand" "wa")))]
+  "VECTOR_MEM_VSX_P (<MODE>mode)"
+{
+  return rs6000_emit_xxpermdi (operands, operands[3], NULL_RTX);
+})
+
+(define_insn "*vsx_concat_<mode>_2"
+  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
+	(vec_concat:VSX_D
+	 (match_operand:<VS_scalar> 1 "gpc_reg_operand" "wa")
+	 (vec_select:<VS_scalar>
+	  (match_operand:VSX_D 2 "gpc_reg_operand" "wa")
+	  (parallel [(match_operand:QI 3 "const_0_to_1_operand" "n")]))))]
+  "VECTOR_MEM_VSX_P (<MODE>mode)"
+{
+  return rs6000_emit_xxpermdi (operands, NULL_RTX, operands[3]);
+})
+
+(define_insn "*vsx_concat_<mode>_3"
+  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
+	(vec_concat:VSX_D
+	 (vec_select:<VS_scalar>
+	  (match_operand:VSX_D 1 "gpc_reg_operand" "wa")
+	  (parallel [(match_operand:QI 3 "const_0_to_1_operand" "n")]))
+	 (vec_select:<VS_scalar>
+	  (match_operand:VSX_D 2 "gpc_reg_operand" "wa")
+	  (parallel [(match_operand:QI 4 "const_0_to_1_operand" "n")]))))]
+  "VECTOR_MEM_VSX_P (<MODE>mode)"
+{
+  return rs6000_emit_xxpermdi (operands, operands[3], operands[4]);
+})
+
 ;; Special purpose concat using xxpermdi to glue two single precision values
 ;; together, relying on the fact that internally scalar floats are represented
 ;; as doubles.  This is used to initialize a V4SF vector with 4 floats
@@ -2587,25 +2626,35 @@  (define_expand "vsx_set_v1ti"
   DONE;
 })
 
-;; Set the element of a V2DI/VD2F mode
-(define_insn "vsx_set_<mode>"
-  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wd,?<VSa>")
-	(unspec:VSX_D
-	 [(match_operand:VSX_D 1 "vsx_register_operand" "wd,<VSa>")
-	  (match_operand:<VS_scalar> 2 "vsx_register_operand" "<VS_64reg>,<VSa>")
-	  (match_operand:QI 3 "u5bit_cint_operand" "i,i")]
-	 UNSPEC_VSX_SET))]
+;; Rewrite V2DF/V2DI set in terms of VEC_CONCAT
+(define_expand "vsx_set_<mode>"
+  [(use (match_operand:VSX_D 0 "vsx_register_operand"))
+   (use (match_operand:VSX_D 1 "vsx_register_operand"))
+   (use (match_operand:<VS_scalar> 2 "gpc_reg_operand"))
+   (use (match_operand:QI 3 "const_0_to_1_operand"))]
   "VECTOR_MEM_VSX_P (<MODE>mode)"
 {
-  int idx_first = BYTES_BIG_ENDIAN ? 0 : 1;
-  if (INTVAL (operands[3]) == idx_first)
-    return \"xxpermdi %x0,%x2,%x1,1\";
-  else if (INTVAL (operands[3]) == 1 - idx_first)
-    return \"xxpermdi %x0,%x1,%x2,0\";
+  rtx dest = operands[0];
+  rtx vec_reg = operands[1];
+  rtx value = operands[2];
+  rtx ele = operands[3];
+  rtx tmp = gen_reg_rtx (<VS_scalar>mode);
+
+  if (ele == const0_rtx)
+    {
+      emit_insn (gen_vsx_extract_<mode> (tmp, vec_reg, const1_rtx));
+      emit_insn (gen_vsx_concat_<mode> (dest, value, tmp));
+      DONE;
+    }
+  else if (ele == const1_rtx)
+    {
+      emit_insn (gen_vsx_extract_<mode> (tmp, vec_reg, const0_rtx));
+      emit_insn (gen_vsx_concat_<mode> (dest, tmp, value));
+      DONE;
+    }
   else
     gcc_unreachable ();
-}
-  [(set_attr "type" "vecperm")])
+})
 
 ;; Extract a DF/DI element from V2DF/V2DI
 ;; Optimize cases were we can do a simple or direct move.
Index: gcc/testsuite/gcc.target/powerpc/vsx-extract-6.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vsx-extract-6.c	(svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc/vsx-extract-6.c)	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vsx-extract-6.c	(.../gcc/testsuite/gcc.target/powerpc/vsx-extract-6.c)	(revision 250640)
@@ -0,0 +1,15 @@ 
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mvsx" } */
+
+vector unsigned long
+test_vpasted (vector unsigned long high, vector unsigned long low)
+{
+  vector unsigned long res;
+  res[1] = high[1];
+  res[0] = low[0];
+  return res;
+}
+
+/* { dg-final { scan-assembler-times {\mxxpermdi\M} 1 } } */
Index: gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c	(svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c)	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c	(.../gcc/testsuite/gcc.target/powerpc/vsx-extract-7.c)	(revision 250640)
@@ -0,0 +1,15 @@ 
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mvsx" } */
+
+vector double
+test_vpasted (vector double high, vector double low)
+{
+  vector double res;
+  res[1] = high[1];
+  res[0] = low[0];
+  return res;
+}
+
+/* { dg-final { scan-assembler-times {\mxxpermdi\M} 1 } } */