From patchwork Thu Jan  4 14:16:06 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Bill Schmidt <wschmidt@linux.vnet.ibm.com>
X-Patchwork-Id: 855609
Return-Path: 
 <gcc-patches-return-470139-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: ozlabs.org;
	spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org
	(client-ip=209.132.180.131; helo=sourceware.org;
	envelope-from=gcc-patches-return-470139-incoming=patchwork.ozlabs.org@gcc.gnu.org;
	receiver=<UNKNOWN>)
Authentication-Results: ozlabs.org; dkim=pass (1024-bit key;
	unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org
	header.b="MhhURPsY"; dkim-atps=neutral
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 3zC8x34p2Bz9t3V
	for <incoming@patchwork.ozlabs.org>;
	Fri,  5 Jan 2018 01:16:22 +1100 (AEDT)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:to:cc
	:from:subject:date:mime-version:content-type
	:content-transfer-encoding:message-id; q=dns; s=default; b=tmRF+
	5UFDNtv537f/ZM+97/mNuO0mlBw8KnL8qa8JwUot6i9GoDNA66m8tpLprq1VQdXq
	5aSnjkAJ0jLil2kVKk1ZiteS31vyC2W7SXZHX+WcH/PHffaP2pn7z7fWRUdqfkpa
	4Qb0rEmDv5MrMiBJ3S+f9EDyVtYiGiZhBVwyIM=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:to:cc
	:from:subject:date:mime-version:content-type
	:content-transfer-encoding:message-id; s=default; bh=vdKl6g3fkvp
	rHQzGra+y948WWic=; b=MhhURPsY1V3eQsrIr9+c8cW4B5EZ/h5TpKzCnHKDdFL
	0Lidz5mMQkov4Dpazv+zs30l0AQ25aUF8cdt5O+r4tFdAguHOrUM/XqgiuYbVSFh
	8hussP4mO243q2K2GVh0A8f88EdlqqcJJlbD6d+4Z189777yPU3LJg369FxSbfv8
	=
Received: (qmail 71305 invoked by alias); 4 Jan 2018 14:16:14 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 71296 invoked by uid 89); 4 Jan 2018 14:16:13 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-10.6 required=5.0 tests=AWL, BAYES_00,
	GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS,
	KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT,
	RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.2 spammy=v0,
	altivec.md, UD:altivec.md, altivecmd
X-HELO: mx0a-001b2d01.pphosted.com
Received: from mx0b-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com)
	(148.163.158.5) by sourceware.org
	(qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;
	Thu, 04 Jan 2018 14:16:11 +0000
Received: from pps.filterd (m0098416.ppops.net [127.0.0.1])	by
	mx0b-001b2d01.pphosted.com (8.16.0.21/8.16.0.21) with SMTP id
	w04EDojm118431	for <gcc-patches@gcc.gnu.org>;
	Thu, 4 Jan 2018 09:16:10 -0500
Received: from e13.ny.us.ibm.com (e13.ny.us.ibm.com [129.33.205.203])	by
	mx0b-001b2d01.pphosted.com with ESMTP id
	2f9kxena7s-1	(version=TLSv1.2 cipher=AES256-SHA bits=256
	verify=NOT)	for <gcc-patches@gcc.gnu.org>;
	Thu, 04 Jan 2018 09:16:09 -0500
Received: from localhost	by e13.ny.us.ibm.com with IBM ESMTP SMTP Gateway:
	Authorized Use Only! Violators will be prosecuted	for
	<gcc-patches@gcc.gnu.org> from <wschmidt@linux.vnet.ibm.com>;
	Thu, 4 Jan 2018 09:16:08 -0500
Received: from b01cxnp23034.gho.pok.ibm.com (9.57.198.29)	by
	e13.ny.us.ibm.com (146.89.104.200) with IBM ESMTP SMTP
	Gateway: Authorized Use Only! Violators will be prosecuted;
	Thu, 4 Jan 2018 09:16:07 -0500
Received: from b01ledav004.gho.pok.ibm.com (b01ledav004.gho.pok.ibm.com
	[9.57.199.109])	by b01cxnp23034.gho.pok.ibm.com
	(8.14.9/8.14.9/NCO v10.0) with ESMTP id w04EG62u46923790;
	Thu, 4 Jan 2018 14:16:06 GMT
Received: from b01ledav004.gho.pok.ibm.com (unknown [127.0.0.1])	by IMSVA
	(Postfix) with ESMTP id 2D2C6112047;
	Thu,  4 Jan 2018 09:14:19 -0500 (EST)
Received: from bigmac.rchland.ibm.com (unknown [9.10.86.85])	by
	b01ledav004.gho.pok.ibm.com (Postfix) with ESMTP id
	EA701112034; Thu,  4 Jan 2018 09:14:18 -0500 (EST)
To: GCC Patches <gcc-patches@gcc.gnu.org>
Cc: Segher Boessenkool <segher@kernel.crashing.org>,
	David Edelsohn <dje.gcc@gmail.com>
From: Bill Schmidt <wschmidt@linux.vnet.ibm.com>
Subject: [PATCH, rs6000] Fix PR83677 (incorrect generation of xxpermr)
Date: Thu, 4 Jan 2018 08:16:06 -0600
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13;
	rv:52.0) Gecko/20100101 Thunderbird/52.5.0
MIME-Version: 1.0
X-TM-AS-GCONF: 00
x-cbid: 18010414-0008-0000-0000-000002B92000
X-IBM-SpamModules-Scores: 
X-IBM-SpamModules-Versions: BY=3.00008317; HX=3.00000241; KW=3.00000007;
	PH=3.00000004; SC=3.00000244; SDB=6.00970142; UDB=6.00491319;
	IPR=6.00750113; BA=6.00005765; NDR=6.00000001; ZLA=6.00000005;
	ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000;
	ZU=6.00000002; MB=3.00018869; XFM=3.00000015;
	UTC=2018-01-04 14:16:08
X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused
x-cbparentid: 18010414-0009-0000-0000-000037C8C7B5
Message-Id: <00763fdd-3674-d40d-867a-e2425ccf7808@linux.vnet.ibm.com>
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, ,
	definitions=2018-01-04_06:, , signatures=0
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0
	priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0
	bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0
	impostorscore=0 adultscore=0 classifier=spam adjust=0
	reason=mlx scancount=1 engine=8.0.1-1709140000
	definitions=main-1801040197
X-IsSubscribed: yes

Hi,

https://gcc.gnu.org/PR83677 reports that generation of xxpermr is always
wrong.  It effectively inverts the order of the two input registers from
what they should be.  This patch addresses that and provides a test case
modified from the original report.

Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.
Is this okay for trunk and shortly for backport to GCC 7?  I will check
on 6, but I'm pretty certain this was introduced in 7, as 6 has only
minimal POWER9 support.

Thanks,
Bill


[gcc]

2018-01-04  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	PR target/83677
	* config/rs6000/altivec.md (*altivec_vpermr_<mode>_internal):
	Reverse operand 1 and 2 constraints for second alternative; output
	operand 2 in second position rather than operand 1.

[gcc/testsuite]

2018-01-04  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	PR target/83677
	* gcc.target/powerpc/pr83677.c: New file.

Index: gcc/config/rs6000/altivec.md
===================================================================
--- gcc/config/rs6000/altivec.md	(revision 256216)
+++ gcc/config/rs6000/altivec.md	(working copy)
@@ -2200,14 +2200,14 @@
 
 (define_insn "*altivec_vpermr_<mode>_internal"
   [(set (match_operand:VM 0 "register_operand" "=v,?wo")
-	(unspec:VM [(match_operand:VM 1 "register_operand" "v,wo")
-		    (match_operand:VM 2 "register_operand" "v,0")
+	(unspec:VM [(match_operand:VM 1 "register_operand" "v,0")
+		    (match_operand:VM 2 "register_operand" "v,wo")
 		    (match_operand:V16QI 3 "register_operand" "v,wo")]
 		   UNSPEC_VPERMR))]
   "TARGET_P9_VECTOR"
   "@
    vpermr %0,%2,%1,%3
-   xxpermr %x0,%x1,%x3"
+   xxpermr %x0,%x2,%x3"
   [(set_attr "type" "vecperm")
    (set_attr "length" "4")])
 
Index: gcc/testsuite/gcc.target/powerpc/pr83677.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr83677.c	(nonexistent)
+++ gcc/testsuite/gcc.target/powerpc/pr83677.c	(working copy)
@@ -0,0 +1,166 @@
+/* { dg-do run { target { powerpc64*-*-* && { lp64 && p9vector_hw } } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-options "-mcpu=power9 -O2 " } */
+
+/* PR83677: This test case used to fail due to mis-generation of the
+   xxpermr instruction.  It requires inlining to create enough register
+   pressure that we generate xxpermr rather than vpermr.  */
+
+#include <altivec.h>
+
+void v_expand_u8(vector unsigned char* a, vector unsigned short* b0, vector unsigned short* b1)
+{
+  *b0 = (vector unsigned short)vec_mergeh(*a, vec_splats((unsigned char)0));
+  *b1 = (vector unsigned short)vec_mergel(*a, vec_splats((unsigned char)0));
+}
+
+void v_expand_u16(vector unsigned short* a, vector unsigned int* b0, vector unsigned int* b1)
+{
+    *b0 = (vector unsigned int)vec_mergeh(*a, vec_splats((unsigned short)0));
+    *b1 = (vector unsigned int)vec_mergel(*a, vec_splats((unsigned short)0));
+}
+
+void v_load_deinterleave_u8(unsigned char *ptr, vector unsigned char* a, vector unsigned char* b, vector unsigned char* c)
+{
+    vector unsigned char v1 = vec_xl( 0, ptr);
+    vector unsigned char v2 = vec_xl(16, ptr);
+    vector unsigned char v3 = vec_xl(32, ptr);
+
+    static const vector unsigned char a12_perm = {0, 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 0, 0, 0, 0, 0};
+    static const vector unsigned char a123_perm = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 17, 20, 23, 26, 29};
+    *a = vec_perm(vec_perm(v1, v2, a12_perm), v3, a123_perm);
+
+    static const vector unsigned char b12_perm = {1, 4, 7, 10, 13, 16, 19, 22, 25, 28, 31, 0, 0, 0, 0, 0};
+    static const vector unsigned char b123_perm = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 18, 21, 24, 27, 30};
+    *b = vec_perm(vec_perm(v1, v2, b12_perm), v3, b123_perm);
+
+    static const vector unsigned char c12_perm = {2, 5, 8, 11, 14, 17, 20, 23, 26, 29, 0, 0, 0, 0, 0, 0};
+    static const vector unsigned char c123_perm = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 16, 19, 22, 25, 28, 31};
+    *c = vec_perm(vec_perm(v1, v2, c12_perm), v3, c123_perm);
+}
+
+void v_load_deinterleave_f32(float *ptr, vector float* a, vector float* b, vector float* c)
+{
+    vector float v1 = vec_xl( 0, ptr);
+    vector float v2 = vec_xl(16, ptr);
+    vector float v3 = vec_xl(32, ptr);
+
+    static const vector unsigned char flp = {0, 1, 2, 3, 12, 13, 14, 15, 16, 17, 18, 19, 28, 29, 30, 31};
+    *a = vec_perm(v1, vec_sld(v3, v2, 8), flp);
+
+    static const vector unsigned char flp2 = {28, 29, 30, 31, 0, 1, 2, 3, 12, 13, 14, 15, 16, 17, 18, 19};
+    *b = vec_perm(v2, vec_sld(v1, v3, 8), flp2);
+
+    *c = vec_perm(vec_sld(v2, v1, 8), v3, flp);
+}
+
+void v_store_interleave_f32(float *ptr, vector float a, vector float b, vector float c)
+{
+    vector float hbc = vec_mergeh(b, c);
+
+    static const vector unsigned char ahbc = {0, 1, 2, 3, 16, 17, 18, 19, 20, 21, 22, 23, 4, 5, 6, 7};
+    vec_xst(vec_perm(a, hbc, ahbc),  0, ptr);
+
+    vector float lab = vec_mergel(a, b);
+    vec_xst(vec_sld(lab, hbc, 8), 16, ptr);
+
+    static const vector unsigned char clab = {8, 9, 10, 11, 24, 25, 26, 27, 28, 29, 30, 31, 12, 13, 14, 15};
+    vec_xst(vec_perm(c, lab, clab), 32, ptr);
+}
+
+vector float v_cvt_f32(vector unsigned int a)
+{
+    return (vector float)vec_ctf(a, 0);
+}
+
+void acc_simd_(const unsigned char* src, float* dst, const unsigned char* mask, int len)
+{
+    int x = 0;
+    const int cVectorWidth = 16;
+
+            for ( ; x <= len - cVectorWidth; x += cVectorWidth)
+            {
+                vector unsigned char v_mask = vec_xl(0, mask + x);
+                v_mask = (vector unsigned char)vec_cmpeq(vec_splats((unsigned char)0), v_mask);
+                v_mask = (vector unsigned char)vec_nor(v_mask, v_mask);
+                vector unsigned char v_src0, v_src1, v_src2;
+                v_load_deinterleave_u8((unsigned char *)(src + (x * 3)), &v_src0, &v_src1, &v_src2);
+                v_src0 = v_src0 & v_mask;
+                v_src1 = v_src1 & v_mask;
+                v_src2 = v_src2 & v_mask;
+
+                /* expand 16 uchar to 4 vectors which contains 4 uint */
+                vector unsigned short v_src00, v_src01, v_src10, v_src11, v_src20, v_src21;
+                v_expand_u8(&v_src0, &v_src00, &v_src01);
+                v_expand_u8(&v_src1, &v_src10, &v_src11);
+                v_expand_u8(&v_src2, &v_src20, &v_src21);
+                vector unsigned int v_src000, v_src001, v_src010, v_src011;
+                vector unsigned int v_src100, v_src101, v_src110, v_src111;
+                vector unsigned int v_src200, v_src201, v_src210, v_src211;
+                v_expand_u16(&v_src00, &v_src000, &v_src001);
+                v_expand_u16(&v_src01, &v_src010, &v_src011);
+                v_expand_u16(&v_src10, &v_src100, &v_src101);
+                v_expand_u16(&v_src11, &v_src110, &v_src111);
+                v_expand_u16(&v_src20, &v_src200, &v_src201);
+                v_expand_u16(&v_src21, &v_src210, &v_src211);
+
+                vector float v_dst000, v_dst001, v_dst010, v_dst011;
+                vector float v_dst100, v_dst101, v_dst110, v_dst111;
+                vector float v_dst200, v_dst201, v_dst210, v_dst211;
+                v_load_deinterleave_f32(dst + (x * 3),        &v_dst000, &v_dst100, &v_dst200);
+                v_load_deinterleave_f32(dst + ((x + 4) * 3),  &v_dst001, &v_dst101, &v_dst201);
+                v_load_deinterleave_f32(dst + ((x + 8) * 3),  &v_dst010, &v_dst110, &v_dst210);
+                v_load_deinterleave_f32(dst + ((x + 12) * 3), &v_dst011, &v_dst111, &v_dst211);
+
+                v_store_interleave_f32(dst + (x * 3),        vec_add(v_dst000, v_cvt_f32(v_src000)), vec_add(v_dst100, v_cvt_f32(v_src100)), vec_add(v_dst200, v_cvt_f32(v_src200)));
+                v_store_interleave_f32(dst + ((x + 4) * 3),  vec_add(v_dst001, v_cvt_f32(v_src001)), vec_add(v_dst101, v_cvt_f32(v_src101)), vec_add(v_dst201, v_cvt_f32(v_src201)));
+                v_store_interleave_f32(dst + ((x + 8) * 3),  vec_add(v_dst010, v_cvt_f32(v_src010)), vec_add(v_dst110, v_cvt_f32(v_src110)), vec_add(v_dst210, v_cvt_f32(v_src210)));
+                v_store_interleave_f32(dst + ((x + 12) * 3), vec_add(v_dst011, v_cvt_f32(v_src011)), vec_add(v_dst111, v_cvt_f32(v_src111)), vec_add(v_dst211, v_cvt_f32(v_src211)));
+            }
+    return;
+}
+
+void acc_(const unsigned char* src, float* dst, const unsigned char* mask, int len)
+{
+    int x = 0;
+    src += (x * 3);
+    dst += (x * 3);
+    for( ; x < len; x++, src += 3, dst += 3 )
+    {
+        if( mask[x] ) /* if mask, R/G/B dst[] += src[] */
+        {
+            for( int k = 0; k < 3; k++ )
+            {
+                dst[k] += src[k];
+            }
+        }
+    }
+    return;
+}
+
+#define N 16
+
+int main(int argc, char *argv[])
+{
+    unsigned char __attribute__ ((aligned (16) )) mask[] = {0, 0, 0, 0,  0, 0, 0, 0,  0, 1, 0, 0,  1, 0, 0, 1};
+    unsigned char __attribute__ ((aligned (16) )) src[3*N];
+    float __attribute__ ((aligned (16) )) dst[3*N];
+    float __attribute__ ((aligned (16) )) exp[3*N];
+
+    int i;
+
+    /* initialize src and dst */
+    for (i=0; i<3*N; i++) src[i] = (unsigned char)(i*3);
+    for (i=0; i<3*N; i++) {dst[i] = i * 1.0f; exp[i] = dst[i];}
+
+    acc_(src, exp, mask, N);
+    acc_simd_(src, dst, mask, N);
+
+    for (i=0; i<N; i++)
+    {
+        if ((dst[3*i] != exp[3*i]) || (dst[3*i+1] != exp[3*i+1]) || (dst[3*i+2] != exp[3*i+2]))
+	  __builtin_abort ();
+    }
+
+    return 0;
+}