diff mbox

[rs6000] Prefer vspltisw/h over xxspltib+instruction when available

Message ID 73436B77-BC85-4919-975E-915A6D93F585@linux.vnet.ibm.com
State New
Headers show

Commit Message

Bill Schmidt June 21, 2016, 8:14 p.m. UTC
Hi,

I discovered recently that, with -mcpu=power9, an attempt to generate a vspltish instruction resulted instead in an xxspltib followed by a vupkhsb.  This is semantically correct but the extra instruction is not optimal.  I found that there was some logic in xxspltib_constant_p to do special casing for const_vector with small constants, but not for vec_duplicate with small constants.  This patch duplicates that logic so we can generate the single instruction when possible.

When I did this, I ran into a problem with an existing test case.  We end up matching the *vsx_splat_v4si_internal pattern instead of falling back to the altivec_vspltisw pattern.  The constraints don't match for constant input.  To avoid this, I added a pattern ahead of this one that will match for VMX output registers and produce the vspltisw as desired.  This corrected the failing test and produces the expected code.

I've added a test case to demonstrate the code works properly now in the usual case.

Bootstrapped and tested on powerpc64le-unknown-linux-gnu.  OK for trunk, and for 6.2 after suitable burn-in?

Thanks!

Bill


[gcc]

2016-06-21  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* config/rs6000/rs6000.c (xxspltib_constant_p): Prefer vspltisw/h
	for vec_duplicate when this is cheaper.
	* config/rs6000/vsx.md (*vsx_splat_v4si_altivec): New define_insn.

[gcc/testsuite]

2016-06-21  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* gcc.target/powerpc/splat-p9-1.c: New test.

Comments

Segher Boessenkool June 21, 2016, 10:34 p.m. UTC | #1
On Tue, Jun 21, 2016 at 03:14:51PM -0500, Bill Schmidt wrote:
> I discovered recently that, with -mcpu=power9, an attempt to generate a vspltish instruction resulted instead in an xxspltib followed by a vupkhsb.  This is semantically correct but the extra instruction is not optimal.  I found that there was some logic in xxspltib_constant_p to do special casing for const_vector with small constants, but not for vec_duplicate with small constants.  This patch duplicates that logic so we can generate the single instruction when possible.

This part is okay.

> When I did this, I ran into a problem with an existing test case.  We end up matching the *vsx_splat_v4si_internal pattern instead of falling back to the altivec_vspltisw pattern.  The constraints don't match for constant input.  To avoid this, I added a pattern ahead of this one that will match for VMX output registers and produce the vspltisw as desired.  This corrected the failing test and produces the expected code.

Why does the predicate allow constant input, while the constraints do not?

> I've added a test case to demonstrate the code works properly now in the usual case.

Thanks :-)


Segher
Bill Schmidt June 21, 2016, 11:46 p.m. UTC | #2
> On Jun 21, 2016, at 5:34 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> On Tue, Jun 21, 2016 at 03:14:51PM -0500, Bill Schmidt wrote:
>> I discovered recently that, with -mcpu=power9, an attempt to generate a vspltish instruction resulted instead in an xxspltib followed by a vupkhsb.  This is semantically correct but the extra instruction is not optimal.  I found that there was some logic in xxspltib_constant_p to do special casing for const_vector with small constants, but not for vec_duplicate with small constants.  This patch duplicates that logic so we can generate the single instruction when possible.
> 
> This part is okay.
> 
>> When I did this, I ran into a problem with an existing test case.  We end up matching the *vsx_splat_v4si_internal pattern instead of falling back to the altivec_vspltisw pattern.  The constraints don't match for constant input.  To avoid this, I added a pattern ahead of this one that will match for VMX output registers and produce the vspltisw as desired.  This corrected the failing test and produces the expected code.
> 
> Why does the predicate allow constant input, while the constraints do not?

I have no idea why it was built that way.  The predicate seems to provide for all sorts of things, but this and the subsequent pattern both handle only a subset of the constraints implied by it.  To be honest, I didn't feel competent to try to fix the existing patterns.  Do you have any suggestions for what to do instead?

Thanks!
Bill

> 
>> I've added a test case to demonstrate the code works properly now in the usual case.
> 
> Thanks :-)
> 
> 
> Segher
>
Segher Boessenkool June 22, 2016, 2:22 p.m. UTC | #3
On Tue, Jun 21, 2016 at 06:46:57PM -0500, Bill Schmidt wrote:
> >> When I did this, I ran into a problem with an existing test case.  We end up matching the *vsx_splat_v4si_internal pattern instead of falling back to the altivec_vspltisw pattern.  The constraints don't match for constant input.  To avoid this, I added a pattern ahead of this one that will match for VMX output registers and produce the vspltisw as desired.  This corrected the failing test and produces the expected code.
> > 
> > Why does the predicate allow constant input, while the constraints do not?
> 
> I have no idea why it was built that way.  The predicate seems to provide for all sorts of things, but this and the subsequent pattern both handle only a subset of the constraints implied by it.  To be honest, I didn't feel competent to try to fix the existing patterns.  Do you have any suggestions for what to do instead?

Don't give up so easily?  ;-)

The predicate should be tightened, the expander should use a new predicate
that allows all those other things.  The hardest part is figuring a good
name for it ;-)


Segher
diff mbox

Patch

Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 237619)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -6329,6 +6329,13 @@  xxspltib_constant_p (rtx op,
       value = INTVAL (element);
       if (!IN_RANGE (value, -128, 127))
 	return false;
+
+      /* See if we could generate vspltisw/vspltish directly instead of
+	 xxspltib + sign extend.  Special case 0/-1 to allow getting
+         any VSX register instead of an Altivec register.  */
+      if (!IN_RANGE (value, -1, 0) && EASY_VECTOR_15 (value)
+	  && (mode == V4SImode || mode == V8HImode))
+	return false;
     }
 
   /* Handle (const_vector [...]).  */
Index: gcc/config/rs6000/vsx.md
===================================================================
--- gcc/config/rs6000/vsx.md	(revision 237619)
+++ gcc/config/rs6000/vsx.md	(working copy)
@@ -2400,6 +2400,17 @@ 
     operands[1] = force_reg (<VS_scalar>mode, operands[1]);
 })
 
+;; The pattern following this one hides altivec_vspltisw, which we
+;; prefer to match when possible, so duplicate that here for
+;; TARGET_P9_VECTOR.
+(define_insn "*vsx_splat_v4si_altivec"
+  [(set (match_operand:V4SI 0 "altivec_register_operand" "=v")
+        (vec_duplicate:V4SI
+	 (match_operand:QI 1 "s5bit_cint_operand" "i")))]
+  "TARGET_P9_VECTOR"
+  "vspltisw %0,%1"
+  [(set_attr "type" "vecperm")])
+
 (define_insn "*vsx_splat_v4si_internal"
   [(set (match_operand:V4SI 0 "vsx_register_operand" "=wa,wa")
 	(vec_duplicate:V4SI
Index: gcc/testsuite/gcc.target/powerpc/splat-p9-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/splat-p9-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/splat-p9-1.c	(working copy)
@@ -0,0 +1,16 @@ 
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-maltivec -mcpu=power9" } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-final { scan-assembler "vspltish" } } */
+/* { dg-final { scan-assembler-not "xxspltib" } } */
+
+/* Make sure we don't use an inefficient sequence for small integer splat.  */
+
+#include <altivec.h>
+
+vector short
+foo ()
+{
+  return vec_splat_s16 (5);
+}