From patchwork Thu Sep  4 04:28:16 2014
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Bill Schmidt <wschmidt@linux.vnet.ibm.com>
X-Patchwork-Id: 385721
Return-Path: 
 <gcc-patches-return-376810-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 12EB51401F6
	for <incoming@patchwork.ozlabs.org>;
	Thu,  4 Sep 2014 14:28:39 +1000 (EST)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender
	:message-id:subject:from:to:cc:date:content-type:mime-version
	:content-transfer-encoding; q=dns; s=default; b=rfoigjeP0pykRAho
	xqCD5brNZPUI95UC/KpEpAJBYgJ31P7IbYBY2EaqPHy64h234eOupiy+jOqNgOlN
	lzTHuiluAG/feEixSdRqaD70hP7Nv0R/TKwjv1PfWii+D7WqCKrgpjtiaexw+OJT
	gxdpus2BZ9zRX40Ga2MZaA6J254=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender
	:message-id:subject:from:to:cc:date:content-type:mime-version
	:content-transfer-encoding; s=default; bh=vEtn1Y9T5b3PdXMntERQyT
	Hl6RE=; b=mxNdVPNEN/a9j3TF6OC5EluxgFi98pIMKg1Kbo0NZKY6Ip3gBZVsg0
	7xrIJfqM4EFCQZmuzGpyfcZamjsVu03alR/EBaaY7JGuRsIatEOTBJyDnbSRPCF5
	woEwSkjX6hD+wEf1P/t89N5HswR00AVlQ620xzW0FqNxeZA+GlRhk=
Received: (qmail 5691 invoked by alias); 4 Sep 2014 04:28:25 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 5675 invoked by uid 89); 4 Sep 2014 04:28:23 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-3.3 required=5.0 tests=AWL, BAYES_00,
	RP_MATCHES_RCVD autolearn=ham version=3.3.2
X-HELO: e9.ny.us.ibm.com
Received: from e9.ny.us.ibm.com (HELO e9.ny.us.ibm.com) (32.97.182.139) by
	sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with
	(AES256-SHA encrypted) ESMTPS; Thu, 04 Sep 2014 04:28:20 +0000
Received: from /spool/local	by e9.ny.us.ibm.com with IBM ESMTP SMTP Gateway:
	Authorized Use Only! Violators will be prosecuted	for
	<gcc-patches@gcc.gnu.org> from <wschmidt@linux.vnet.ibm.com>;
	Thu, 4 Sep 2014 00:28:18 -0400
Received: from d01dlp02.pok.ibm.com (9.56.250.167)	by e9.ny.us.ibm.com
	(192.168.1.109) with IBM ESMTP SMTP Gateway: Authorized Use
	Only! Violators will be prosecuted; Thu, 4 Sep 2014 00:28:15 -0400
Received: from b01cxnp23034.gho.pok.ibm.com (b01cxnp23034.gho.pok.ibm.com
	[9.57.198.29])	by d01dlp02.pok.ibm.com (Postfix) with ESMTP
	id 19E6A6E8029	for <gcc-patches@gcc.gnu.org>;
	Thu,  4 Sep 2014 00:28:03 -0400 (EDT)
Received: from d01av04.pok.ibm.com (d01av04.pok.ibm.com [9.56.224.64])	by
	b01cxnp23034.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with
	ESMTP id s844SEUx7930190	for <gcc-patches@gcc.gnu.org>;
	Thu, 4 Sep 2014 04:28:14 GMT
Received: from d01av04.pok.ibm.com (localhost [127.0.0.1])	by
	d01av04.pok.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with
	ESMTP id s844SE8e005549	for <gcc-patches@gcc.gnu.org>;
	Thu, 4 Sep 2014 00:28:14 -0400
Received: from [9.50.20.53] ([9.50.20.53])	by d01av04.pok.ibm.com
	(8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP id s844SDU2005515;
	Thu, 4 Sep 2014 00:28:14 -0400
Message-ID: <1409804896.3163.55.camel@gnopaine>
Subject: [PATCH,
	rs6000] Handle vec_extract and splat patterns in analyze_swaps
From: Bill Schmidt <wschmidt@linux.vnet.ibm.com>
To: gcc-patches@gcc.gnu.org
Cc: dje.gcc@gmail.com
Date: Wed, 03 Sep 2014 23:28:16 -0500
Mime-Version: 1.0
X-TM-AS-MML: disable
X-Content-Scanned: Fidelis XPS MAILER
x-cbid: 14090404-7182-0000-0000-0000005BED52
X-IsSubscribed: yes

Hi,

This patch adds more special handling to analyze_swaps to allow us to
improve more computations.  Previously I had disallowed VEC_SELECT in
all cases.  This is now changed to allow a select of a single lane,
either for an extract operation or for a splat operation.  If a
computation containing such operations is optimized, the selected lane
is changed to count from the other end of the vector.  Several new tests
are added to check these opportunities are now exploited.

Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
regressions.  Is this ok for trunk?

Thanks,
Bill


[gcc]

2014-09-03  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* config/rs6000/rs6000.c (special_handling_values): Add
	SH_EXTRACT.
	(rtx_is_swappable_p): Look for patterns with a VEC_SELECT, perhaps
	wrapped in a VEC_DUPLICATE, representing an extract.  Mark these
	as swappable with special handling SH_EXTRACT.  Remove
	UNSPEC_VSX_XXSPLTW from the list of disallowed unspecs for the
	optimization.
	(adjust_extract): New function.
	(handle_special_swappables): Add default to case statement; add
	case for SH_EXTRACT that calls adjust_extract.
	(dump_swap_insn_table): Handle SH_EXTRACT.

[gcc/testsuite]

2014-09-03  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* gcc.target/powerpc/swaps-p8-13.c: New test.
	* gcc.target/powerpc/swaps-p8-14.c: New test.
	* gcc.target/powerpc/swaps-p8-15.c: New test.

Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 214879)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -33562,7 +33562,8 @@ enum special_handling_values {
   SH_CONST_VECTOR,
   SH_SUBREG,
   SH_NOSWAP_LD,
-  SH_NOSWAP_ST
+  SH_NOSWAP_ST,
+  SH_EXTRACT
 };
 
 /* Union INSN with all insns containing definitions that reach USE.
@@ -33704,6 +33705,7 @@ rtx_is_swappable_p (rtx op, unsigned int *special)
 {
   enum rtx_code code = GET_CODE (op);
   int i, j;
+  rtx parallel;
 
   switch (code)
     {
@@ -33714,7 +33716,6 @@ rtx_is_swappable_p (rtx op, unsigned int *special)
       return 1;
 
     case VEC_CONCAT:
-    case VEC_SELECT:
     case ASM_INPUT:
     case ASM_OPERANDS:
       return 0;
@@ -33732,9 +33733,31 @@ rtx_is_swappable_p (rtx op, unsigned int *special)
 	 handling.  */
       if (GET_CODE (XEXP (op, 0)) == CONST_INT)
 	return 1;
+      else if (GET_CODE (XEXP (op, 0)) == REG
+	       && GET_MODE_INNER (GET_MODE (op)) == GET_MODE (XEXP (op, 0)))
+	/* This catches V2DF and V2DI splat, at a minimum.  */
+	return 1;
+      else if (GET_CODE (XEXP (op, 0)) == VEC_SELECT)
+	/* If the duplicated item is from a select, defer to the select
+	   processing to see if we can change the lane for the splat.  */
+	return rtx_is_swappable_p (XEXP (op, 0), special);
       else
 	return 0;
 
+    case VEC_SELECT:
+      /* A vec_extract operation is ok if we change the lane.  */
+      if (GET_CODE (XEXP (op, 0)) == REG
+	  && GET_MODE_INNER (GET_MODE (XEXP (op, 0))) == GET_MODE (op)
+	  && GET_CODE ((parallel = XEXP (op, 1))) == PARALLEL
+	  && XVECLEN (parallel, 0) == 1
+	  && GET_CODE (XVECEXP (parallel, 0, 0)) == CONST_INT)
+	{
+	  *special = SH_EXTRACT;
+	  return 1;
+	}
+      else
+	return 0;
+
     case UNSPEC:
       {
 	/* Various operations are unsafe for this optimization, at least
@@ -33777,7 +33800,6 @@ rtx_is_swappable_p (rtx op, unsigned int *special)
 	    || val == UNSPEC_VSX_CVSPDPN
 	    || val == UNSPEC_VSX_SET
 	    || val == UNSPEC_VSX_SLDWI
-	    || val == UNSPEC_VSX_XXSPLTW
 	    || val == UNSPEC_VUNPACK_HI_SIGN
 	    || val == UNSPEC_VUNPACK_HI_SIGN_DIRECT
 	    || val == UNSPEC_VUNPACK_LO_SIGN
@@ -34115,6 +34137,27 @@ permute_store (rtx_insn *insn)
 	     INSN_UID (insn));
 }
 
+/* Given OP that contains a vector extract operation, change the index
+   of the extracted lane to count from the other side of the vector.  */
+static void
+adjust_extract (rtx_insn *insn)
+{
+  rtx body = PATTERN (insn);
+  /* The vec_select may be wrapped in a vec_duplicate for a splat, so
+     account for that.  */
+  rtx sel = (GET_CODE (body) == VEC_DUPLICATE
+	     ? XEXP (XEXP (body, 0), 1)
+	     : XEXP (body, 1));
+  rtx par = XEXP (sel, 1);
+  int nunits = GET_MODE_NUNITS (GET_MODE (XEXP (sel, 0)));
+  XVECEXP (par, 0, 0) = GEN_INT (nunits - 1 - INTVAL (XVECEXP (par, 0, 0)));
+  INSN_CODE (insn) = -1; /* Force re-recognition.  */
+  df_insn_rescan (insn);
+
+  if (dump_file)
+    fprintf (dump_file, "Changing lane for extract %d\n", INSN_UID (insn));
+}
+
 /* The insn described by INSN_ENTRY[I] can be swapped, but only
    with special handling.  Take care of that here.  */
 static void
@@ -34125,6 +34168,8 @@ handle_special_swappables (swap_web_entry *insn_en
 
   switch (insn_entry[i].special_handling)
     {
+    default:
+      gcc_unreachable ();
     case SH_CONST_VECTOR:
       {
 	/* A CONST_VECTOR will only show up somewhere in the RHS of a SET.  */
@@ -34151,6 +34196,9 @@ handle_special_swappables (swap_web_entry *insn_en
       /* Convert a non-permuting store to a permuting one.  */
       permute_store (insn);
       break;
+    case SH_EXTRACT:
+      /* Change the lane on an extract operation.  */
+      adjust_extract (insn);
     }
 }
 
@@ -34219,6 +34267,8 @@ dump_swap_insn_table (swap_web_entry *insn_entry)
 	      fputs ("special:load ", dump_file);
 	    else if (insn_entry[i].special_handling == SH_NOSWAP_ST)
 	      fputs ("special:store ", dump_file);
+	    else if (insn_entry[i].special_handling == SH_EXTRACT)
+	      fputs ("special:extract ", dump_file);
 	  }
 	if (insn_entry[i].web_not_optimizable)
 	  fputs ("unoptimizable ", dump_file);
Index: gcc/testsuite/gcc.target/powerpc/swaps-p8-13.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/swaps-p8-13.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/swaps-p8-13.c	(working copy)
@@ -0,0 +1,53 @@
+/* { dg-do run { target { powerpc64le-*-* } } } */
+/* { dg-options "-mcpu=power8 -O3" } */
+
+#include <altivec.h>
+void abort ();
+
+#define N 4096
+long long ca[N] __attribute__((aligned(16)));
+long long cb[N] __attribute__((aligned(16)));
+long long cc[N] __attribute__((aligned(16)));
+long long cd[N] __attribute__((aligned(16)));
+long long x;
+
+__attribute__((noinline)) void foo ()
+{
+  int i;
+  vector long long va, vb, vc, vd, tmp;
+  volatile unsigned long long three = 3;
+  vector unsigned long long threes = vec_splats (three);
+  for (i = 0; i < N; i+=2) {
+    vb = vec_vsx_ld (0, (vector long long *)&cb[i]);
+    vc = vec_vsx_ld (0, (vector long long *)&cc[i]);
+    vd = vec_vsx_ld (0, (vector long long *)&cd[i]);
+    tmp = vec_add (vb, vc);
+    tmp = vec_sub (tmp, vd);
+    tmp = vec_sra (tmp, threes);
+    x = vec_extract (tmp, 0);
+    vec_vsx_st (tmp, 0, (vector long long *)&ca[i]);
+  }
+}
+
+__attribute__((noinline)) void init ()
+{
+  int i;
+  for (i = 0; i < N; ++i) {
+    cb[i] = 3 * i - 2048;
+    cc[i] = -5 * i + 93;
+    cd[i] = i + 14;
+  }
+}
+
+int main ()
+{
+  int i;
+  init ();
+  foo ();
+  for (i = 0; i < N; ++i)
+    if (ca[i] != (-3 * i - 1969) >> 3)
+      abort ();
+  if (x != ca[N-1])
+    abort ();
+  return 0;
+}
Index: gcc/testsuite/gcc.target/powerpc/swaps-p8-14.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/swaps-p8-14.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/swaps-p8-14.c	(working copy)
@@ -0,0 +1,42 @@
+/* { dg-do compile { target { powerpc64le-*-* } } } */
+/* { dg-options "-mcpu=power8 -O3" } */
+/* { dg-final { scan-assembler "lxvd2x" } } */
+/* { dg-final { scan-assembler "stxvd2x" } } */
+/* { dg-final { scan-assembler "stxsdx" } } */
+/* { dg-final { scan-assembler-times "xxpermdi" 1 } } */
+
+/* The only xxpermdi expected is for the vec_splats.  */
+
+#include <altivec.h>
+void abort ();
+
+#define N 4096
+long long ca[N] __attribute__((aligned(16)));
+long long cb[N] __attribute__((aligned(16)));
+long long cc[N] __attribute__((aligned(16)));
+long long cd[N] __attribute__((aligned(16)));
+long long x;
+
+__attribute__((noinline)) void foo ()
+{
+  int i;
+  vector long long va, vb, vc, vd, tmp;
+  volatile unsigned long long three = 3;
+  vector unsigned long long threes = vec_splats (three);
+  for (i = 0; i < N; i+=2) {
+    vb = vec_vsx_ld (0, (vector long long *)&cb[i]);
+    vc = vec_vsx_ld (0, (vector long long *)&cc[i]);
+    vd = vec_vsx_ld (0, (vector long long *)&cd[i]);
+    tmp = vec_add (vb, vc);
+    tmp = vec_sub (tmp, vd);
+    tmp = vec_sra (tmp, threes);
+    x = vec_extract (tmp, 0);
+    vec_vsx_st (tmp, 0, (vector long long *)&ca[i]);
+  }
+}
+
+int main ()
+{
+  foo ();
+  return 0;
+}
Index: gcc/testsuite/gcc.target/powerpc/swaps-p8-15.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/swaps-p8-15.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/swaps-p8-15.c	(working copy)
@@ -0,0 +1,49 @@
+/* { dg-do compile { target { powerpc64le-*-* } } } */
+/* { dg-options "-mcpu=power8 -O3" } */
+/* { dg-final { scan-assembler "lxvd2x" } } */
+/* { dg-final { scan-assembler "stxvd2x" } } */
+/* { dg-final { scan-assembler "xxspltw" } } */
+/* { dg-final { scan-assembler-not "xxpermdi" } } */
+
+#include <altivec.h>
+void abort();
+
+typedef struct xx {vector double l; vector double h;} xx;
+
+#define N 4096
+#define M 10000000
+vector float ca[N][4] = {0};
+vector float cb[N][4] = {0};
+vector float cc[N][4] = {0};
+
+__attribute__((noinline)) void foo ()
+{
+  int i;
+  for (i = 0; i < N; i++) {
+    cc[i][0] = vec_mul(vec_splats(cb[i][0][0]), ca[i][0]);
+    cc[i][0] = vec_madd(cc[i][0],vec_splats(cb[i][0][1]), ca[i][1]);
+    cc[i][0] = vec_madd(cc[i][0],vec_splats(cb[i][0][2]), ca[i][2]);
+    cc[i][0] = vec_madd(cc[i][0],vec_splats(cb[i][0][3]), ca[i][3]);
+
+    cc[i][1] = vec_mul(vec_splats(cb[i][1][0]), ca[i][0]);
+    cc[i][1] = vec_madd(cc[i][0],vec_splats(cb[i][1][1]), ca[i][1]);
+    cc[i][1] = vec_madd(cc[i][0],vec_splats(cb[i][1][2]), ca[i][2]);
+    cc[i][1] = vec_madd(cc[i][0],vec_splats(cb[i][1][3]), ca[i][3]);
+    
+    cc[i][2] = vec_mul(vec_splats(cb[i][2][0]), ca[i][0]);
+    cc[i][2] = vec_madd(cc[i][0],vec_splats(cb[i][2][1]), ca[i][1]);
+    cc[i][2] = vec_madd(cc[i][0],vec_splats(cb[i][2][2]), ca[i][2]);
+    cc[i][2] = vec_madd(cc[i][0],vec_splats(cb[i][2][3]), ca[i][3]);
+    
+    cc[i][3] = vec_mul(vec_splats(cb[i][3][0]), ca[i][0]);
+    cc[i][3] = vec_madd(cc[i][0],vec_splats(cb[i][3][1]), ca[i][1]);
+    cc[i][3] = vec_madd(cc[i][0],vec_splats(cb[i][3][2]), ca[i][2]);
+    cc[i][3] = vec_madd(cc[i][0],vec_splats(cb[i][3][3]), ca[i][3]);
+  }
+}
+
+int main ()
+{
+  foo ();
+  return 0;
+}