From patchwork Thu Jan 4 14:16:06 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bill Schmidt X-Patchwork-Id: 855609 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-470139-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="MhhURPsY"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zC8x34p2Bz9t3V for ; Fri, 5 Jan 2018 01:16:22 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to:cc :from:subject:date:mime-version:content-type :content-transfer-encoding:message-id; q=dns; s=default; b=tmRF+ 5UFDNtv537f/ZM+97/mNuO0mlBw8KnL8qa8JwUot6i9GoDNA66m8tpLprq1VQdXq 5aSnjkAJ0jLil2kVKk1ZiteS31vyC2W7SXZHX+WcH/PHffaP2pn7z7fWRUdqfkpa 4Qb0rEmDv5MrMiBJ3S+f9EDyVtYiGiZhBVwyIM= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to:cc :from:subject:date:mime-version:content-type :content-transfer-encoding:message-id; s=default; bh=vdKl6g3fkvp rHQzGra+y948WWic=; b=MhhURPsY1V3eQsrIr9+c8cW4B5EZ/h5TpKzCnHKDdFL 0Lidz5mMQkov4Dpazv+zs30l0AQ25aUF8cdt5O+r4tFdAguHOrUM/XqgiuYbVSFh 8hussP4mO243q2K2GVh0A8f88EdlqqcJJlbD6d+4Z189777yPU3LJg369FxSbfv8 = Received: (qmail 71305 invoked by alias); 4 Jan 2018 14:16:14 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 71296 invoked by uid 89); 4 Jan 2018 14:16:13 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-10.6 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.2 spammy=v0, altivec.md, UD:altivec.md, altivecmd X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0b-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.158.5) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 04 Jan 2018 14:16:11 +0000 Received: from pps.filterd (m0098416.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.21/8.16.0.21) with SMTP id w04EDojm118431 for ; Thu, 4 Jan 2018 09:16:10 -0500 Received: from e13.ny.us.ibm.com (e13.ny.us.ibm.com [129.33.205.203]) by mx0b-001b2d01.pphosted.com with ESMTP id 2f9kxena7s-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Thu, 04 Jan 2018 09:16:09 -0500 Received: from localhost by e13.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 4 Jan 2018 09:16:08 -0500 Received: from b01cxnp23034.gho.pok.ibm.com (9.57.198.29) by e13.ny.us.ibm.com (146.89.104.200) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Thu, 4 Jan 2018 09:16:07 -0500 Received: from b01ledav004.gho.pok.ibm.com (b01ledav004.gho.pok.ibm.com [9.57.199.109]) by b01cxnp23034.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id w04EG62u46923790; Thu, 4 Jan 2018 14:16:06 GMT Received: from b01ledav004.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 2D2C6112047; Thu, 4 Jan 2018 09:14:19 -0500 (EST) Received: from bigmac.rchland.ibm.com (unknown [9.10.86.85]) by b01ledav004.gho.pok.ibm.com (Postfix) with ESMTP id EA701112034; Thu, 4 Jan 2018 09:14:18 -0500 (EST) To: GCC Patches Cc: Segher Boessenkool , David Edelsohn From: Bill Schmidt Subject: [PATCH, rs6000] Fix PR83677 (incorrect generation of xxpermr) Date: Thu, 4 Jan 2018 08:16:06 -0600 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 X-TM-AS-GCONF: 00 x-cbid: 18010414-0008-0000-0000-000002B92000 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00008317; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000244; SDB=6.00970142; UDB=6.00491319; IPR=6.00750113; BA=6.00005765; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00018869; XFM=3.00000015; UTC=2018-01-04 14:16:08 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18010414-0009-0000-0000-000037C8C7B5 Message-Id: <00763fdd-3674-d40d-867a-e2425ccf7808@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2018-01-04_06:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1801040197 X-IsSubscribed: yes Hi, https://gcc.gnu.org/PR83677 reports that generation of xxpermr is always wrong. It effectively inverts the order of the two input registers from what they should be. This patch addresses that and provides a test case modified from the original report. Bootstrapped and tested on powerpc64le-linux-gnu with no regressions. Is this okay for trunk and shortly for backport to GCC 7? I will check on 6, but I'm pretty certain this was introduced in 7, as 6 has only minimal POWER9 support. Thanks, Bill [gcc] 2018-01-04 Bill Schmidt PR target/83677 * config/rs6000/altivec.md (*altivec_vpermr__internal): Reverse operand 1 and 2 constraints for second alternative; output operand 2 in second position rather than operand 1. [gcc/testsuite] 2018-01-04 Bill Schmidt PR target/83677 * gcc.target/powerpc/pr83677.c: New file. Index: gcc/config/rs6000/altivec.md =================================================================== --- gcc/config/rs6000/altivec.md (revision 256216) +++ gcc/config/rs6000/altivec.md (working copy) @@ -2200,14 +2200,14 @@ (define_insn "*altivec_vpermr__internal" [(set (match_operand:VM 0 "register_operand" "=v,?wo") - (unspec:VM [(match_operand:VM 1 "register_operand" "v,wo") - (match_operand:VM 2 "register_operand" "v,0") + (unspec:VM [(match_operand:VM 1 "register_operand" "v,0") + (match_operand:VM 2 "register_operand" "v,wo") (match_operand:V16QI 3 "register_operand" "v,wo")] UNSPEC_VPERMR))] "TARGET_P9_VECTOR" "@ vpermr %0,%2,%1,%3 - xxpermr %x0,%x1,%x3" + xxpermr %x0,%x2,%x3" [(set_attr "type" "vecperm") (set_attr "length" "4")]) Index: gcc/testsuite/gcc.target/powerpc/pr83677.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/pr83677.c (nonexistent) +++ gcc/testsuite/gcc.target/powerpc/pr83677.c (working copy) @@ -0,0 +1,166 @@ +/* { dg-do run { target { powerpc64*-*-* && { lp64 && p9vector_hw } } } } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */ +/* { dg-options "-mcpu=power9 -O2 " } */ + +/* PR83677: This test case used to fail due to mis-generation of the + xxpermr instruction. It requires inlining to create enough register + pressure that we generate xxpermr rather than vpermr. */ + +#include + +void v_expand_u8(vector unsigned char* a, vector unsigned short* b0, vector unsigned short* b1) +{ + *b0 = (vector unsigned short)vec_mergeh(*a, vec_splats((unsigned char)0)); + *b1 = (vector unsigned short)vec_mergel(*a, vec_splats((unsigned char)0)); +} + +void v_expand_u16(vector unsigned short* a, vector unsigned int* b0, vector unsigned int* b1) +{ + *b0 = (vector unsigned int)vec_mergeh(*a, vec_splats((unsigned short)0)); + *b1 = (vector unsigned int)vec_mergel(*a, vec_splats((unsigned short)0)); +} + +void v_load_deinterleave_u8(unsigned char *ptr, vector unsigned char* a, vector unsigned char* b, vector unsigned char* c) +{ + vector unsigned char v1 = vec_xl( 0, ptr); + vector unsigned char v2 = vec_xl(16, ptr); + vector unsigned char v3 = vec_xl(32, ptr); + + static const vector unsigned char a12_perm = {0, 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 0, 0, 0, 0, 0}; + static const vector unsigned char a123_perm = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 17, 20, 23, 26, 29}; + *a = vec_perm(vec_perm(v1, v2, a12_perm), v3, a123_perm); + + static const vector unsigned char b12_perm = {1, 4, 7, 10, 13, 16, 19, 22, 25, 28, 31, 0, 0, 0, 0, 0}; + static const vector unsigned char b123_perm = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 18, 21, 24, 27, 30}; + *b = vec_perm(vec_perm(v1, v2, b12_perm), v3, b123_perm); + + static const vector unsigned char c12_perm = {2, 5, 8, 11, 14, 17, 20, 23, 26, 29, 0, 0, 0, 0, 0, 0}; + static const vector unsigned char c123_perm = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 16, 19, 22, 25, 28, 31}; + *c = vec_perm(vec_perm(v1, v2, c12_perm), v3, c123_perm); +} + +void v_load_deinterleave_f32(float *ptr, vector float* a, vector float* b, vector float* c) +{ + vector float v1 = vec_xl( 0, ptr); + vector float v2 = vec_xl(16, ptr); + vector float v3 = vec_xl(32, ptr); + + static const vector unsigned char flp = {0, 1, 2, 3, 12, 13, 14, 15, 16, 17, 18, 19, 28, 29, 30, 31}; + *a = vec_perm(v1, vec_sld(v3, v2, 8), flp); + + static const vector unsigned char flp2 = {28, 29, 30, 31, 0, 1, 2, 3, 12, 13, 14, 15, 16, 17, 18, 19}; + *b = vec_perm(v2, vec_sld(v1, v3, 8), flp2); + + *c = vec_perm(vec_sld(v2, v1, 8), v3, flp); +} + +void v_store_interleave_f32(float *ptr, vector float a, vector float b, vector float c) +{ + vector float hbc = vec_mergeh(b, c); + + static const vector unsigned char ahbc = {0, 1, 2, 3, 16, 17, 18, 19, 20, 21, 22, 23, 4, 5, 6, 7}; + vec_xst(vec_perm(a, hbc, ahbc), 0, ptr); + + vector float lab = vec_mergel(a, b); + vec_xst(vec_sld(lab, hbc, 8), 16, ptr); + + static const vector unsigned char clab = {8, 9, 10, 11, 24, 25, 26, 27, 28, 29, 30, 31, 12, 13, 14, 15}; + vec_xst(vec_perm(c, lab, clab), 32, ptr); +} + +vector float v_cvt_f32(vector unsigned int a) +{ + return (vector float)vec_ctf(a, 0); +} + +void acc_simd_(const unsigned char* src, float* dst, const unsigned char* mask, int len) +{ + int x = 0; + const int cVectorWidth = 16; + + for ( ; x <= len - cVectorWidth; x += cVectorWidth) + { + vector unsigned char v_mask = vec_xl(0, mask + x); + v_mask = (vector unsigned char)vec_cmpeq(vec_splats((unsigned char)0), v_mask); + v_mask = (vector unsigned char)vec_nor(v_mask, v_mask); + vector unsigned char v_src0, v_src1, v_src2; + v_load_deinterleave_u8((unsigned char *)(src + (x * 3)), &v_src0, &v_src1, &v_src2); + v_src0 = v_src0 & v_mask; + v_src1 = v_src1 & v_mask; + v_src2 = v_src2 & v_mask; + + /* expand 16 uchar to 4 vectors which contains 4 uint */ + vector unsigned short v_src00, v_src01, v_src10, v_src11, v_src20, v_src21; + v_expand_u8(&v_src0, &v_src00, &v_src01); + v_expand_u8(&v_src1, &v_src10, &v_src11); + v_expand_u8(&v_src2, &v_src20, &v_src21); + vector unsigned int v_src000, v_src001, v_src010, v_src011; + vector unsigned int v_src100, v_src101, v_src110, v_src111; + vector unsigned int v_src200, v_src201, v_src210, v_src211; + v_expand_u16(&v_src00, &v_src000, &v_src001); + v_expand_u16(&v_src01, &v_src010, &v_src011); + v_expand_u16(&v_src10, &v_src100, &v_src101); + v_expand_u16(&v_src11, &v_src110, &v_src111); + v_expand_u16(&v_src20, &v_src200, &v_src201); + v_expand_u16(&v_src21, &v_src210, &v_src211); + + vector float v_dst000, v_dst001, v_dst010, v_dst011; + vector float v_dst100, v_dst101, v_dst110, v_dst111; + vector float v_dst200, v_dst201, v_dst210, v_dst211; + v_load_deinterleave_f32(dst + (x * 3), &v_dst000, &v_dst100, &v_dst200); + v_load_deinterleave_f32(dst + ((x + 4) * 3), &v_dst001, &v_dst101, &v_dst201); + v_load_deinterleave_f32(dst + ((x + 8) * 3), &v_dst010, &v_dst110, &v_dst210); + v_load_deinterleave_f32(dst + ((x + 12) * 3), &v_dst011, &v_dst111, &v_dst211); + + v_store_interleave_f32(dst + (x * 3), vec_add(v_dst000, v_cvt_f32(v_src000)), vec_add(v_dst100, v_cvt_f32(v_src100)), vec_add(v_dst200, v_cvt_f32(v_src200))); + v_store_interleave_f32(dst + ((x + 4) * 3), vec_add(v_dst001, v_cvt_f32(v_src001)), vec_add(v_dst101, v_cvt_f32(v_src101)), vec_add(v_dst201, v_cvt_f32(v_src201))); + v_store_interleave_f32(dst + ((x + 8) * 3), vec_add(v_dst010, v_cvt_f32(v_src010)), vec_add(v_dst110, v_cvt_f32(v_src110)), vec_add(v_dst210, v_cvt_f32(v_src210))); + v_store_interleave_f32(dst + ((x + 12) * 3), vec_add(v_dst011, v_cvt_f32(v_src011)), vec_add(v_dst111, v_cvt_f32(v_src111)), vec_add(v_dst211, v_cvt_f32(v_src211))); + } + return; +} + +void acc_(const unsigned char* src, float* dst, const unsigned char* mask, int len) +{ + int x = 0; + src += (x * 3); + dst += (x * 3); + for( ; x < len; x++, src += 3, dst += 3 ) + { + if( mask[x] ) /* if mask, R/G/B dst[] += src[] */ + { + for( int k = 0; k < 3; k++ ) + { + dst[k] += src[k]; + } + } + } + return; +} + +#define N 16 + +int main(int argc, char *argv[]) +{ + unsigned char __attribute__ ((aligned (16) )) mask[] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1}; + unsigned char __attribute__ ((aligned (16) )) src[3*N]; + float __attribute__ ((aligned (16) )) dst[3*N]; + float __attribute__ ((aligned (16) )) exp[3*N]; + + int i; + + /* initialize src and dst */ + for (i=0; i<3*N; i++) src[i] = (unsigned char)(i*3); + for (i=0; i<3*N; i++) {dst[i] = i * 1.0f; exp[i] = dst[i];} + + acc_(src, exp, mask, N); + acc_simd_(src, dst, mask, N); + + for (i=0; i