From patchwork Mon Jan 16 18:12:22 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bill Schmidt X-Patchwork-Id: 715879 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3v2Ltq3Xmgz9t0C for ; Tue, 17 Jan 2017 05:12:51 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="fR+8m8nU"; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to:cc :from:subject:date:mime-version:content-type :content-transfer-encoding:message-id; q=dns; s=default; b=ij6OG qVLXDPeXJq8MnMsqylgQt5cEUnN/KRgcCGKjOCt6Pya7/qWcQeTTW2htJK6vrZkG il+6YhFbqXrqBaeXSU5R8lvb0TcxR25+3yhqQMip3pRZ3Z4Y7qv//h/KMXIKpU2j n3NvM2pIg2TVPsL5rNKADZ2fGMj0NYQ6uIHqE8= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to:cc :from:subject:date:mime-version:content-type :content-transfer-encoding:message-id; s=default; bh=ngPabk3lpUf L0dE89SL+2TIHAh4=; b=fR+8m8nUNt+AjDcHffMkKcM7S2xG957natBqrWYhudb DxuUo0inV0wbOrCt3S9E6z07IFoUkc7HZMRP1WRpSHLVC/oNPgcrH5a4B2pphmeC wKSBb5lb4QfwmdrmIUlT0t7M7dWwe7pSn2ychhYXokZPkkJBuQMa4HHdsXO3PjiA = Received: (qmail 48035 invoked by alias); 16 Jan 2017 18:12:41 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 48015 invoked by uid 89); 16 Jan 2017 18:12:40 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.8 required=5.0 tests=AWL, BAYES_00, KAM_ASCII_DIVIDERS, KAM_LAZY_DOMAIN_SECURITY, RCVD_IN_DNSWL_LOW autolearn=no version=3.3.2 spammy=UD:rs6000-builtin.def, sk:define_ X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0a-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.156.1) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 16 Jan 2017 18:12:30 +0000 Received: from pps.filterd (m0098410.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.20/8.16.0.20) with SMTP id v0GI8nH5077376 for ; Mon, 16 Jan 2017 13:12:29 -0500 Received: from e18.ny.us.ibm.com (e18.ny.us.ibm.com [129.33.205.208]) by mx0a-001b2d01.pphosted.com with ESMTP id 2811e1cf1d-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Mon, 16 Jan 2017 13:12:28 -0500 Received: from localhost by e18.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 16 Jan 2017 13:12:27 -0500 Received: from d01dlp03.pok.ibm.com (9.56.250.168) by e18.ny.us.ibm.com (146.89.104.205) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Mon, 16 Jan 2017 13:12:23 -0500 Received: from b01cxnp23033.gho.pok.ibm.com (b01cxnp23033.gho.pok.ibm.com [9.57.198.28]) by d01dlp03.pok.ibm.com (Postfix) with ESMTP id A7093C9003E; Mon, 16 Jan 2017 13:12:05 -0500 (EST) Received: from b01ledav006.gho.pok.ibm.com (b01ledav006.gho.pok.ibm.com [9.57.199.111]) by b01cxnp23033.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id v0GICNaE18743800; Mon, 16 Jan 2017 18:12:23 GMT Received: from b01ledav006.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 2A9A7AC03A; Mon, 16 Jan 2017 13:12:23 -0500 (EST) Received: from BigMac.local (unknown [9.85.189.231]) by b01ledav006.gho.pok.ibm.com (Postfix) with ESMTP id D6BFAAC040; Mon, 16 Jan 2017 13:12:22 -0500 (EST) To: GCC Patches Cc: Segher Boessenkool , David Edelsohn From: Bill Schmidt Subject: [PATCH, rs6000] Add support for vec_rlnm and vec_rlmi Date: Mon, 16 Jan 2017 12:12:22 -0600 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:45.0) Gecko/20100101 Thunderbird/45.6.0 MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Content-Scanned: Fidelis XPS MAILER x-cbid: 17011618-0044-0000-0000-00000249C37C X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00006445; HX=3.00000240; KW=3.00000007; PH=3.00000004; SC=3.00000199; SDB=6.00808467; UDB=6.00393742; IPR=6.00585826; BA=6.00005062; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00013936; XFM=3.00000011; UTC=2017-01-16 18:12:25 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17011618-0045-0000-0000-00000676CA2C Message-Id: <26e01065-a7b2-9a11-96e9-579ad3e1614b@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2017-01-16_15:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1612050000 definitions=main-1701160258 X-IsSubscribed: yes Hi, ISA 3.0 introduces new instructions vrlwmi, vrldmi, vrlwnm, and vrldnm. This patch provides access to them via built-ins, including the vec_rlmi and vec_rlnm built-ins mandated by Appendix A of the ELFv2 ABI document. I also added a vec_vrlnm built-in, which is a more direct translation of the vrlwnm and vrldnm instructions that some users might prefer. This has been bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions. I am in process of testing them on a big-endian system as well. Provided there are no problems there, is this ok for trunk? Thanks, Bill [gcc] 2017-01-16 Bill Schmidt * config/rs6000/altivec.h (vec_rlmi): New #define. (vec_vrlnm): Likewise. (vec_rlnm): Likewise. * config/rs6000/altivec.md (UNSPEC_VRLMI): New UNSPEC enum value. (UNSPEC_VRLNM): Likewise. (VIlong): New mode iterator. (altivec_vrlmi): New define_insn. (altivec_vrlnm): Likewise. * config/rs6000/rs6000-builtin.def (VRLWNM): New monomorphic function entry. (VRLDNM): Likewise. (RLNM): New polymorphic function entry. (VRLWMI): New monomorphic function entry. (VRLDMI): Likewise. (RLMI): New polymorphic function entry. * config/rs6000/r6000-c.c (altivec_overloaded_builtin_table): Add new entries for P9V_BUILTIN_VEC_RLMI and P9V_BUILTIN_VEC_RLNM. * doc/extend.texi: Add description of vec_rlmi, vec_rlnm, and vec_vrlnm. [gcc/testsuite] 2017-01-16 Bill Schmidt * vec-rlmi-rlnm.c: New file. Index: gcc/config/rs6000/altivec.h =================================================================== --- gcc/config/rs6000/altivec.h (revision 244498) +++ gcc/config/rs6000/altivec.h (working copy) @@ -168,6 +168,9 @@ #define vec_re __builtin_vec_re #define vec_round __builtin_vec_round #define vec_recipdiv __builtin_vec_recipdiv +#define vec_rlmi __builtin_vec_rlmi +#define vec_vrlnm __builtin_vec_rlnm +#define vec_rlnm(a,b,c) (__builtin_vec_rlnm(a,(b<<8)|c)) #define vec_rsqrt __builtin_vec_rsqrt #define vec_rsqrte __builtin_vec_rsqrte #define vec_vsubfp __builtin_vec_vsubfp Index: gcc/config/rs6000/altivec.md =================================================================== --- gcc/config/rs6000/altivec.md (revision 244498) +++ gcc/config/rs6000/altivec.md (working copy) @@ -156,6 +156,8 @@ UNSPEC_CMPRB UNSPEC_CMPRB2 UNSPEC_CMPEQB + UNSPEC_VRLMI + UNSPEC_VRLNM ]) (define_c_enum "unspecv" @@ -168,8 +170,10 @@ ;; Like VI, defined in vector.md, but add ISA 2.07 integer vector ops (define_mode_iterator VI2 [V4SI V8HI V16QI V2DI]) -;; Short vec in modes +;; Short vec int modes (define_mode_iterator VIshort [V8HI V16QI]) +;; Longer vec int modes for rotate/mask ops +(define_mode_iterator VIlong [V2DI V4SI]) ;; Vec float modes (define_mode_iterator VF [V4SF]) ;; Vec modes, pity mode iterators are not composable @@ -1627,6 +1631,25 @@ "vrl %0,%1,%2" [(set_attr "type" "vecsimple")]) +(define_insn "altivec_vrlmi" + [(set (match_operand:VIlong 0 "register_operand" "=v") + (unspec:VIlong [(match_operand:VIlong 1 "register_operand" "0") + (match_operand:VIlong 2 "register_operand" "v") + (match_operand:VIlong 3 "register_operand" "v")] + UNSPEC_VRLMI))] + "TARGET_P9_VECTOR" + "vrlmi %0,%2,%3" + [(set_attr "type" "veclogical")]) + +(define_insn "altivec_vrlnm" + [(set (match_operand:VIlong 0 "register_operand" "=v") + (unspec:VIlong [(match_operand:VIlong 1 "register_operand" "v") + (match_operand:VIlong 2 "register_operand" "v")] + UNSPEC_VRLNM))] + "TARGET_P9_VECTOR" + "vrlnm %0,%1,%2" + [(set_attr "type" "veclogical")]) + (define_insn "altivec_vsl" [(set (match_operand:V4SI 0 "register_operand" "=v") (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v") Index: gcc/config/rs6000/rs6000-builtin.def =================================================================== --- gcc/config/rs6000/rs6000-builtin.def (revision 244498) +++ gcc/config/rs6000/rs6000-builtin.def (working copy) @@ -1918,6 +1918,8 @@ BU_P9V_OVERLOAD_2 (VSRV, "vsrv") BU_P9V_AV_2 (VADUB, "vadub", CONST, vaduv16qi3) BU_P9V_AV_2 (VADUH, "vaduh", CONST, vaduv8hi3) BU_P9V_AV_2 (VADUW, "vaduw", CONST, vaduv4si3) +BU_P9V_AV_2 (VRLWNM, "vrlwnm", CONST, altivec_vrlwnm) +BU_P9V_AV_2 (VRLDNM, "vrldnm", CONST, altivec_vrldnm) /* ISA 3.0 vector overloaded 2 argument functions. */ BU_P9V_OVERLOAD_2 (VADU, "vadu") @@ -1924,7 +1926,15 @@ BU_P9V_OVERLOAD_2 (VADU, "vadu") BU_P9V_OVERLOAD_2 (VADUB, "vadub") BU_P9V_OVERLOAD_2 (VADUH, "vaduh") BU_P9V_OVERLOAD_2 (VADUW, "vaduw") +BU_P9V_OVERLOAD_2 (RLNM, "rlnm") +/* ISA 3.0 3-argument vector functions. */ +BU_P9V_AV_3 (VRLWMI, "vrlwmi", CONST, altivec_vrlwmi) +BU_P9V_AV_3 (VRLDMI, "vrldmi", CONST, altivec_vrldmi) + +/* ISA 3.0 vector overloaded 3-argument functions. */ +BU_P9V_OVERLOAD_3 (RLMI, "rlmi") + /* 1 argument vsx scalar functions added in ISA 3.0 (power9). */ BU_P9V_64BIT_VSX_1 (VSEEDP, "scalar_extract_exp", CONST, xsxexpdp) BU_P9V_64BIT_VSX_1 (VSESDP, "scalar_extract_sig", CONST, xsxsigdp) Index: gcc/config/rs6000/rs6000-c.c =================================================================== --- gcc/config/rs6000/rs6000-c.c (revision 244498) +++ gcc/config/rs6000/rs6000-c.c (working copy) @@ -2202,6 +2202,18 @@ const struct altivec_builtin_types altivec_overloa RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_unsigned_V16QI, 0 }, { ALTIVEC_BUILTIN_VEC_VRLB, ALTIVEC_BUILTIN_VRLB, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0 }, + { P9V_BUILTIN_VEC_RLMI, P9V_BUILTIN_VRLWMI, + RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, + RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI }, + { P9V_BUILTIN_VEC_RLMI, P9V_BUILTIN_VRLDMI, + RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, + RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI }, + { P9V_BUILTIN_VEC_RLNM, P9V_BUILTIN_VRLWNM, + RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, + RS6000_BTI_unsigned_V4SI, 0 }, + { P9V_BUILTIN_VEC_RLNM, P9V_BUILTIN_VRLDNM, + RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, + RS6000_BTI_unsigned_V2DI, 0 }, { ALTIVEC_BUILTIN_VEC_SL, ALTIVEC_BUILTIN_VSLB, RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_unsigned_V16QI, 0 }, { ALTIVEC_BUILTIN_VEC_SL, ALTIVEC_BUILTIN_VSLB, Index: gcc/doc/extend.texi =================================================================== --- gcc/doc/extend.texi (revision 244498) +++ gcc/doc/extend.texi (working copy) @@ -18179,6 +18179,43 @@ If any of the enabled test conditions is true, the in the result vector is -1. Otherwise (all of the enabled test conditions are false), the corresponding entry of the result vector is 0. +The following built-in functions are available for the PowerPC family +of processors, starting with ISA 3.0 or later (@option{-mcpu=power9}): +@smallexample +vector unsigned int vec_rlmi (vector unsigned int, vector unsigned int, + vector unsigned int); +vector unsigned long long vec_rlmi (vector unsigned long long, + vector unsigned long long, + vector unsigned long long); +vector unsigned int vec_rlnm (vector unsigned int, vector unsigned int, + vector unsigned int); +vector unsigned long long vec_rlnm (vector unsigned long long, + vector unsigned long long, + vector unsigned long long); +vector unsigned int vec_vrlnm (vector unsigned int, vector unsigned int); +vector unsigned long long vec_vrlnm (vector unsigned long long, + vector unsigned long long); +@end smallexample + +The result of @code{vec_rlmi} is obtained by rotating each element of +the first argument vector left and inserting it under mask into the +second argument vector. The third argument vector contains the mask +beginning in bits 11:15, the mask end in bits 19:23, and the shift +count in bits 27:31, of each element. + +The result of @code{vec_rlnm} is obtained by rotating each element of +the first argument vector left and ANDing it with a mask specified by +the second and third argument vectors. The second argument vector +contains the shift count for each element in the low-order byte. The +third argument vector contains the mask end for each element in the +low-order byte, with the mask begin in the next higher byte. + +The result of @code{vec_vrlnm} is obtained by rotating each element +of the first argument vector left and ANDing it with a mask. The +second argument vector contains the mask beginning in bits 11:15, +the mask end in bits 19:23, and the shift count in bits 27:31, +of each element. + If the cryptographic instructions are enabled (@option{-mcrypto} or @option{-mcpu=power8}), the following builtins are enabled. Index: gcc/testsuite/gcc.target/powerpc/vec-rlmi-rlnm.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/vec-rlmi-rlnm.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/vec-rlmi-rlnm.c (working copy) @@ -0,0 +1,69 @@ +/* { dg-do compile { target { powerpc*-*-* } } } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* { dg-options "-O2 -mcpu=power9" } */ + +#include + +vector unsigned int +rlmi_test_1 (vector unsigned int x, vector unsigned int y, + vector unsigned int z) +{ + return vec_rlmi (x, y, z); +} + +vector unsigned long long +rlmi_test_2 (vector unsigned long long x, vector unsigned long long y, + vector unsigned long long z) +{ + return vec_rlmi (x, y, z); +} + +vector unsigned int +vrlnm_test_1 (vector unsigned int x, vector unsigned int y) +{ + return vec_vrlnm (x, y); +} + +vector unsigned long long +vrlnm_test_2 (vector unsigned long long x, vector unsigned long long y) +{ + return vec_vrlnm (x, y); +} + +vector unsigned int +rlnm_test_1 (vector unsigned int x, vector unsigned int y, + vector unsigned int z) +{ + return vec_rlnm (x, y, z); +} + +vector unsigned long long +rlnm_test_2 (vector unsigned long long x, vector unsigned long long y, + vector unsigned long long z) +{ + return vec_rlnm (x, y, z); +} + +/* Expected code generation for rlmi_test_1 is vrlwmi. + Expected code generation for rlmi_test_2 is vrldmi. + Expected code generation for vrlnm_test_1 is vrlwnm. + Expected code generation for vrlnm_test_2 is vrldnm. + Expected code generation for the others is more complex, because + the second and third arguments are combined by a shift and OR, + and because there is no splat-immediate doubleword. + - For rlnm_test_1: vspltisw, vslw, xxlor, vrlwnm. + - For rlnm_test_2: xxspltib, vextsb2d, vsld, xxlor, vrldnm. + There is a choice of splat instructions in both cases, so we + just check for "splt". */ + +/* { dg-final { scan-assembler-times "vrlwmi" 1 } } */ +/* { dg-final { scan-assembler-times "vrldmi" 1 } } */ +/* { dg-final { scan-assembler-times "splt" 2 } } */ +/* { dg-final { scan-assembler-times "vextsb2d" 1 } } */ +/* { dg-final { scan-assembler-times "vslw" 1 } } */ +/* { dg-final { scan-assembler-times "vsld" 1 } } */ +/* { dg-final { scan-assembler-times "xxlor" 2 } } */ +/* { dg-final { scan-assembler-times "vrlwnm" 2 } } */ +/* { dg-final { scan-assembler-times "vrldnm" 2 } } */