From patchwork Sun Feb 10 16:10:02 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Bill Schmidt <wschmidt@linux.ibm.com>
X-Patchwork-Id: 1039441
Return-Path: 
 <gcc-patches-return-495738-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
	spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org
	(client-ip=209.132.180.131; helo=sourceware.org;
	envelope-from=gcc-patches-return-495738-incoming=patchwork.ozlabs.org@gcc.gnu.org;
	receiver=<UNKNOWN>)
Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none)
	header.from=linux.ibm.com
Authentication-Results: ozlabs.org; dkim=pass (1024-bit key;
	unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org
	header.b="cgBl4k6A"; dkim-atps=neutral
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 43yDR73BNQz9s7h
	for <incoming@patchwork.ozlabs.org>;
	Mon, 11 Feb 2019 03:10:24 +1100 (AEDT)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:to:cc
	:from:subject:date:mime-version:message-id:content-type
	:content-transfer-encoding; q=dns; s=default; b=lEhH/iGggIYlp1z3
	84ojIGORZem1NyiGxX3kcJkeMA/7+/qODcpoA9Hii9FCblvBKxnNsODYxS6EDelr
	Cg7Bd/EWVjoqvlJ+J+Fyuew5XGDsSTRRDiEPDR3qESC4aTec/b+Pxg2zt+GfWQzf
	9zt3D/y1bCbX4lvAShShZ0axr+4=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:to:cc
	:from:subject:date:mime-version:message-id:content-type
	:content-transfer-encoding; s=default; bh=uRXQgrx1VlUw3BcdRiAU5a
	to8Os=; b=cgBl4k6ANM1F5YT2LxsPAmMq3vXugdKahQ0HKApAdLDURgXcVGWi5x
	R6LHDxIl4KjbwxAI9ViLM4B6Hw/0+trQdO0rYFe3PXhgVkS815QafrNc4bcNxHnQ
	Syp/AMMe969oL31QiB9E1tT27/Z5SAfCssCp/npUBK3ejV067eZsw=
Received: (qmail 111264 invoked by alias); 10 Feb 2019 16:10:14 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 110190 invoked by uid 89); 10 Feb 2019 16:10:14 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-12.6 required=5.0 tests=BAYES_00, GIT_PATCH_2,
	GIT_PATCH_3, HTML_MESSAGE, RCVD_IN_DNSWL_LOW,
	SPF_PASS autolearn=ham version=3.3.2 spammy=honor, expectation,
	consisting, type_size
X-HELO: mx0a-001b2d01.pphosted.com
Received: from mx0b-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com)
	(148.163.158.5) by sourceware.org
	(qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;
	Sun, 10 Feb 2019 16:10:11 +0000
Received: from pps.filterd (m0098420.ppops.net [127.0.0.1])	by
	mx0b-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id
	x1AG9s35008699	for <gcc-patches@gcc.gnu.org>;
	Sun, 10 Feb 2019 11:10:09 -0500
Received: from e34.co.us.ibm.com (e34.co.us.ibm.com [32.97.110.152])	by
	mx0b-001b2d01.pphosted.com with ESMTP id
	2qjdb8mayu-1	(version=TLSv1.2 cipher=AES256-GCM-SHA384
	bits=256 verify=NOT)	for <gcc-patches@gcc.gnu.org>;
	Sun, 10 Feb 2019 11:10:09 -0500
Received: from localhost	by e34.co.us.ibm.com with IBM ESMTP SMTP Gateway:
	Authorized Use Only! Violators will be prosecuted	for
	<gcc-patches@gcc.gnu.org> from <wschmidt@linux.ibm.com>;
	Sun, 10 Feb 2019 16:10:08 -0000
Received: from b03cxnp08027.gho.boulder.ibm.com (9.17.130.19)	by
	e34.co.us.ibm.com (192.168.1.134) with IBM ESMTP SMTP
	Gateway: Authorized Use Only! Violators will be prosecuted;
	(version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256)	Sun, 10
	Feb 2019 16:10:05 -0000
Received: from b03ledav001.gho.boulder.ibm.com
	(b03ledav001.gho.boulder.ibm.com [9.17.130.232])	by
	b03cxnp08027.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0)
	with ESMTP id x1AGA4aV30081194	(version=TLSv1/SSLv3
	cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);
	Sun, 10 Feb 2019 16:10:04 GMT
Received: from b03ledav001.gho.boulder.ibm.com (unknown [127.0.0.1])	by
	IMSVA (Postfix) with ESMTP id 7BFE06E04E;
	Sun, 10 Feb 2019 16:10:04 +0000 (GMT)
Received: from b03ledav001.gho.boulder.ibm.com (unknown [127.0.0.1])	by
	IMSVA (Postfix) with ESMTP id BB4636E053;
	Sun, 10 Feb 2019 16:10:03 +0000 (GMT)
Received: from BigMac.local (unknown [9.85.143.212])	by
	b03ledav001.gho.boulder.ibm.com (Postfix) with ESMTP;
	Sun, 10 Feb 2019 16:10:03 +0000 (GMT)
To: GCC Patches <gcc-patches@gcc.gnu.org>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
From: Bill Schmidt <wschmidt@linux.ibm.com>
Subject: [PATCH] rs6000: Vector shift-right should honor modulo semantics
Date: Sun, 10 Feb 2019 10:10:02 -0600
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14;
	rv:60.0) Gecko/20100101 Thunderbird/60.4.0
MIME-Version: 1.0
x-cbid: 19021016-0016-0000-0000-00000981C920
X-IBM-SpamModules-Scores: 
X-IBM-SpamModules-Versions: BY=3.00010571; HX=3.00000242; KW=3.00000007;
	PH=3.00000004; SC=3.00000279; SDB=6.01159096; UDB=6.00604843;
	IPR=6.00939633; MB=3.00025516; MTD=3.00000008; XFM=3.00000015;
	UTC=2019-02-10 16:10:06
X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused
x-cbparentid: 19021016-0017-0000-0000-0000421A07A9
Message-Id: <f1f5ac18-6698-303d-fa58-95e77453a4c2@linux.ibm.com>

Hi!

We had a problem report for code attempting to implement a vector right-shift for a
vector long long (V2DImode) type.  The programmer noted that we don't have a vector
splat-immediate for this mode, but cleverly realized he could use a vector char
splat-immediate since only the lower 6 bits of the shift count are read.  This is a
documented feature of both the vector shift built-ins and the underlying instruction.

Starting with GCC 8, the vector shifts are expanded early in rs6000_gimple_fold_builtin.
However, the GIMPLE folding does not currently perform the required TRUNC_MOD_EXPR to
implement the built-in semantics.  It appears that this was caught earlier for vector
shift-left and fixed, but the same problem was not fixed for vector shift-right.
This patch fixes that.

While fixing that problem, I noted that we get inferior code generation when we try
to fold the vector char splat earlier, due to the type mismatch and some additional
optimizations performed in the middle end.  Because this is a rare circumstance, it
makes sense to avoid the GIMPLE folding in this case and allow the back end to do
the expansion; this produces the clean code we're expecting with the vspltisb intact.

I've added executable tests for both shift-right algebraic and shift-right logical.
Both fail prior to applying the patch, and work correctly afterwards.

Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions.  Is
this okay for trunk, and for GCC 8.3 if there is no fallout by the end of the
week?

Thanks,
Bill


[gcc]

2019-02-08  Bill Schmidt  <wschmidt@linux.ibm.com>

	* config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Shift-right
	and shift-left vector built-ins need to include a TRUNC_MOD_EXPR
	for correct semantics.  Also, don't expand a vector-splat if there
	is a type mismatch; let the back end handle it.

[gcc/testsuite]

2019-02-08  Bill Schmidt  <wschmidt@linux.ibm.com>

	* gcc.target/powerpc/srad-modulo.c: New.
	* gcc.target/powerpc/srd-modulo.c: New.

Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 268707)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -15735,13 +15735,37 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *
     case ALTIVEC_BUILTIN_VSRAH:
     case ALTIVEC_BUILTIN_VSRAW:
     case P8V_BUILTIN_VSRAD:
-      arg0 = gimple_call_arg (stmt, 0);
-      arg1 = gimple_call_arg (stmt, 1);
-      lhs = gimple_call_lhs (stmt);
-      g = gimple_build_assign (lhs, RSHIFT_EXPR, arg0, arg1);
-      gimple_set_location (g, gimple_location (stmt));
-      gsi_replace (gsi, g, true);
-      return true;
+      {
+	arg0 = gimple_call_arg (stmt, 0);
+	arg1 = gimple_call_arg (stmt, 1);
+	lhs = gimple_call_lhs (stmt);
+	tree arg1_type = TREE_TYPE (arg1);
+	tree unsigned_arg1_type = unsigned_type_for (TREE_TYPE (arg1));
+	tree unsigned_element_type = unsigned_type_for (TREE_TYPE (arg1_type));
+	location_t loc = gimple_location (stmt);
+	/* Force arg1 into the range valid matching the arg0 type.  */
+	/* Build a vector consisting of the max valid bit-size values.  */
+	int n_elts = VECTOR_CST_NELTS (arg1);
+	tree element_size = build_int_cst (unsigned_element_type,
+					   128 / n_elts);
+	tree_vector_builder elts (unsigned_arg1_type, n_elts, 1);
+	for (int i = 0; i < n_elts; i++)
+	  elts.safe_push (element_size);
+	tree modulo_tree = elts.build ();
+	/* Modulo the provided shift value against that vector.  */
+	gimple_seq stmts = NULL;
+	tree unsigned_arg1 = gimple_build (&stmts, VIEW_CONVERT_EXPR,
+					   unsigned_arg1_type, arg1);
+	tree new_arg1 = gimple_build (&stmts, loc, TRUNC_MOD_EXPR,
+				      unsigned_arg1_type, unsigned_arg1,
+				      modulo_tree);
+	gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
+	/* And finally, do the shift.  */
+	g = gimple_build_assign (lhs, RSHIFT_EXPR, arg0, new_arg1);
+	gimple_set_location (g, loc);
+	gsi_replace (gsi, g, true);
+	return true;
+      }
    /* Flavors of vector shift left.
       builtin_altivec_vsl{b,h,w} -> vsl{b,h,w}.  */
     case ALTIVEC_BUILTIN_VSLB:
@@ -15795,14 +15819,34 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *
 	arg0 = gimple_call_arg (stmt, 0);
 	arg1 = gimple_call_arg (stmt, 1);
 	lhs = gimple_call_lhs (stmt);
+	tree arg1_type = TREE_TYPE (arg1);
+	tree unsigned_arg1_type = unsigned_type_for (TREE_TYPE (arg1));
+	tree unsigned_element_type = unsigned_type_for (TREE_TYPE (arg1_type));
+	location_t loc = gimple_location (stmt);
 	gimple_seq stmts = NULL;
 	/* Convert arg0 to unsigned.  */
 	tree arg0_unsigned
 	  = gimple_build (&stmts, VIEW_CONVERT_EXPR,
 			  unsigned_type_for (TREE_TYPE (arg0)), arg0);
+	/* Force arg1 into the range valid matching the arg0 type.  */
+	/* Build a vector consisting of the max valid bit-size values.  */
+	int n_elts = VECTOR_CST_NELTS (arg1);
+	tree element_size = build_int_cst (unsigned_element_type,
+					   128 / n_elts);
+	tree_vector_builder elts (unsigned_arg1_type, n_elts, 1);
+	for (int i = 0; i < n_elts; i++)
+	  elts.safe_push (element_size);
+	tree modulo_tree = elts.build ();
+	/* Modulo the provided shift value against that vector.  */
+	tree unsigned_arg1 = gimple_build (&stmts, VIEW_CONVERT_EXPR,
+					   unsigned_arg1_type, arg1);
+	tree new_arg1 = gimple_build (&stmts, loc, TRUNC_MOD_EXPR,
+				      unsigned_arg1_type, unsigned_arg1,
+				      modulo_tree);
+	/* Do the shift.  */
 	tree res
 	  = gimple_build (&stmts, RSHIFT_EXPR,
-			  TREE_TYPE (arg0_unsigned), arg0_unsigned, arg1);
+			  TREE_TYPE (arg0_unsigned), arg0_unsigned, new_arg1);
 	/* Convert result back to the lhs type.  */
 	res = gimple_build (&stmts, VIEW_CONVERT_EXPR, TREE_TYPE (lhs), res);
 	gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
@@ -16061,7 +16105,7 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *
     case ALTIVEC_BUILTIN_VSPLTISH:
     case ALTIVEC_BUILTIN_VSPLTISW:
       {
-	int size;
+	unsigned int size;
 	if (fn_code == ALTIVEC_BUILTIN_VSPLTISB)
 	  size = 8;
 	else if (fn_code == ALTIVEC_BUILTIN_VSPLTISH)
@@ -16072,6 +16116,13 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *
 	arg0 = gimple_call_arg (stmt, 0);
 	lhs = gimple_call_lhs (stmt);
 
+	/* It's sometimes useful to use one of these to build a
+	   splat for V2DImode, since the upper bits will be ignored.
+	   Don't fold if we detect that situation, as we end up
+	   losing the splat instruction in this case.  */
+	if (size != TREE_INT_CST_LOW (TYPE_SIZE (TREE_TYPE (TREE_TYPE (lhs)))))
+	  return false;
+
 	/* Only fold the vec_splat_*() if the lower bits of arg 0 is a
 	   5-bit signed constant in range -16 to +15.  */
 	if (TREE_CODE (arg0) != INTEGER_CST
Index: gcc/testsuite/gcc.target/powerpc/srad-modulo.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/srad-modulo.c	(nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/srad-modulo.c	(working copy)
@@ -0,0 +1,43 @@
+/* Test that using a character splat to set up a shift-right algebraic
+   for a doubleword vector works correctly after gimple folding.  */
+
+/* { dg-do run { target { powerpc64*-*-* && vsx_hw } } } */
+/* { dg-options { "-O3" } } */
+
+#include <altivec.h>
+
+typedef __vector unsigned long long vui64_t;
+typedef __vector long long vi64_t;
+
+static inline vi64_t
+vec_sradi (vi64_t vra, const unsigned int shb)
+{
+  vui64_t rshift;
+  vi64_t result;
+
+  /* Note legitimate use of wrong-type splat due to expectation that only
+     lower 6-bits are read.  */
+  rshift = (vui64_t) vec_splat_s8((const unsigned char)shb);
+
+  /* Vector Shift Right Algebraic Doublewords based on the lower 6-bits
+     of corresponding element of rshift.  */
+  result = vec_vsrad (vra, rshift);
+
+  return (vi64_t) result;
+}
+
+vi64_t
+test_sradi_4 (vi64_t a)
+{
+  return vec_sradi (a, 4);
+}
+
+int
+main ()
+{
+  vi64_t x = {-256, 1025};
+  x = test_sradi_4 (x);
+  if (x[0] != -16 || x[1] != 64)
+    __builtin_abort ();
+  return 0;
+}
Index: gcc/testsuite/gcc.target/powerpc/srd-modulo.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/srd-modulo.c	(nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/srd-modulo.c	(working copy)
@@ -0,0 +1,42 @@
+/* Test that using a character splat to set up a shift-right logical
+   for a doubleword vector works correctly after gimple folding.  */
+
+/* { dg-do run { target { powerpc64*-*-* && vsx_hw } } } */
+/* { dg-options { "-O3" } } */
+
+#include <altivec.h>
+
+typedef __vector unsigned long long vui64_t;
+
+static inline vui64_t
+vec_sradi (vui64_t vra, const unsigned int shb)
+{
+  vui64_t rshift;
+  vui64_t result;
+
+  /* Note legitimate use of wrong-type splat due to expectation that only
+     lower 6-bits are read.  */
+  rshift = (vui64_t) vec_splat_s8((const unsigned char)shb);
+
+  /* Vector Shift Right [Logical] Doublewords based on the lower 6-bits
+     of corresponding element of rshift.  */
+  result = vec_vsrd (vra, rshift);
+
+  return (vui64_t) result;
+}
+
+vui64_t
+test_sradi_4 (vui64_t a)
+{
+  return vec_sradi (a, 4);
+}
+
+int
+main ()
+{
+  vui64_t x = {1992357, 1025};
+  x = test_sradi_4 (x);
+  if (x[0] != 124522 || x[1] != 64)
+    __builtin_abort ();
+  return 0;
+}