From patchwork Fri Jul 26 22:37:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Carl Love X-Patchwork-Id: 1965418 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=sctqGuq0; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WW2nL3wWqz1ybY for ; Sat, 27 Jul 2024 08:42:13 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 58DBC385700F for ; Fri, 26 Jul 2024 22:42:11 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 77D673858402 for ; Fri, 26 Jul 2024 22:37:40 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 77D673858402 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 77D673858402 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1722033557; cv=none; b=cyYZ2FTH+VDIEhFF0oYpUENZZFvbYVV2faFvMmea5VkMZigCzkG+aktn+vw8IobkBfIQHTz7Q5uD9VdA3wrS3IW/w2BTbjEDQmpgAizOSHOXf7jfJDFegXUTU/8azuzeMjLvKFvTkaFd0iCzMuspD9+Pn5dDWDFtA6utS1zBzfI= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1722033557; c=relaxed/simple; bh=9WOdWlQF5K3Fp41Wq54G9l4SvxZLtmVkG6BTZivqYmY=; h=DKIM-Signature:Message-ID:Date:MIME-Version:From:Subject:To; b=aD317VvQ9ykoayKGnaNBB9rja33w+T62BnqsVoNuPA5yZtL5sLRXYvQXksvzrj2MI4EW7mzGPyEAEC+QRSZBJbQK1y/+bR8AAf53mjWBdZNHITXL6BVDE/fl+Id0iaAcYM6ZPuQBDJve1fER4bPu+iGsZqYM7dVP8RlOi5ac5RA= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0353726.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 46QKIas9030097; Fri, 26 Jul 2024 22:37:39 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h= message-id:date:mime-version:from:subject:to:content-type :content-transfer-encoding; s=pp1; bh=XcQlByCRbF8T4dDWYWv4J+oRnM 6GMw4bqZO+AvsL+SI=; b=sctqGuq0Vx6omGB3X3QTy0BujQW/Sj2ZOaHembVFY6 l+n5sqhprIc+taABLnsVvDVT83tm6TSzeMcWMkUcJpwctGyS8NQpKxulOvZL3y/Q VfzII6AcoU46etllTOGWS4FEuNPuDL+1m6G+QGHZhcYsVs6MnoESDCKe2Vv0FeV2 A9zly2X5feqGXLpEyGPRT5UDGHwHSA+4kZLIUvR/6SbLcKrFWf230wnxK1a0gQpA Ah6bkSDMN4q8Cuc0RKnog3wfQkj+JPawmPW9mml12rlINRWkkGzBefqHz1fsSfEy gz8QlKfi/biHZEG+E+39Ne8Ebuv8m75sRxis/cp4meHw== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 40m8b02321-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 26 Jul 2024 22:37:39 +0000 (GMT) Received: from m0353726.ppops.net (m0353726.ppops.net [127.0.0.1]) by pps.reinject (8.18.0.8/8.18.0.8) with ESMTP id 46QMbcTj032298; Fri, 26 Jul 2024 22:37:38 GMT Received: from ppma22.wdc07v.mail.ibm.com (5c.69.3da9.ip4.static.sl-reverse.com [169.61.105.92]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 40m8b0231y-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 26 Jul 2024 22:37:38 +0000 (GMT) Received: from pps.filterd (ppma22.wdc07v.mail.ibm.com [127.0.0.1]) by ppma22.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 46QJGclm018545; Fri, 26 Jul 2024 22:37:37 GMT Received: from smtprelay06.wdc07v.mail.ibm.com ([172.16.1.73]) by ppma22.wdc07v.mail.ibm.com (PPS) with ESMTPS id 40kk3hrnwh-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 26 Jul 2024 22:37:37 +0000 Received: from smtpav03.wdc07v.mail.ibm.com (smtpav03.wdc07v.mail.ibm.com [10.39.53.230]) by smtprelay06.wdc07v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 46QMbY9g39649764 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 26 Jul 2024 22:37:36 GMT Received: from smtpav03.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 22EC65805D; Fri, 26 Jul 2024 22:37:34 +0000 (GMT) Received: from smtpav03.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E8BE35805A; Fri, 26 Jul 2024 22:37:32 +0000 (GMT) Received: from [9.67.189.147] (unknown [9.67.189.147]) by smtpav03.wdc07v.mail.ibm.com (Postfix) with ESMTP; Fri, 26 Jul 2024 22:37:32 +0000 (GMT) Message-ID: Date: Fri, 26 Jul 2024 15:37:32 -0700 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: en-US From: Carl Love Subject: [PATCH ver 2] rs6000, Add new overloaded vector shift builtin int128, varients To: GCC Patches , Kewen , Peter Bergner , segher , cel , David Edelsohn X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: St0QZizRDoptjWrqAzCNGk1p0ZEV9TyK X-Proofpoint-GUID: 3l8sY1Oq2Wl_rfZ8dvjfC7sHGrKLC-SH X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1039,Hydra:6.0.680,FMLib:17.12.28.16 definitions=2024-07-26_13,2024-07-26_01,2024-05-17_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=0 priorityscore=1501 bulkscore=0 adultscore=0 clxscore=1015 lowpriorityscore=0 phishscore=0 suspectscore=0 mlxscore=0 mlxlogscore=999 malwarescore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2407110000 definitions=main-2407260151 X-Spam-Status: No, score=-8.0 required=5.0 tests=BAYES_00, BODY_8BITS, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_MANYTO, KAM_SHORT, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org GCC developers: Version 2, updated rs6000-overload.def to remove adding additonal internal names and to change XXSLDWI_Q to XXSLDWI_1TI per comments from Kewen.  Move new documentation statement for the PIVPR built-ins per comments from Kewen.  Updated dg-do-run directive and added comment about the save-temps  in testcase per feedback from Segher.  Retested the patch on Power 10 with no regressions. The following patch adds the int128 varients to the existing overloaded built-ins vec_sld, vec_sldb, vec_sldw, vec_sll, vec_slo, vec_srdb, vec_srl, vec_sro.  These varients were requested by Steve Munroe. The patch has been tested on a Power 10 system with no regressions. Please let me know if the patch is acceptable for mainline.                                    Carl --------------------------------------------------------------- rs6000, Add new overloaded vector shift builtin int128 varients Add the signed __int128 and unsigned __int128 argument types for the overloaded built-ins vec_sld, vec_sldb, vec_sldw, vec_sll, vec_slo, vec_srdb, vec_srl, vec_sro.  For each of the new argument types add a testcase and update the documentation for the built-in. gcc/ChangeLog:     * config/rs6000/altivec.md (vsdb_): Change     define_insn iterator to VEC_IC.     * config/rs6000/rs6000-builtins.def (__builtin_altivec_vsldoi_v1ti,     __builtin_vsx_xxsldwi_v1ti, __builtin_altivec_vsldb_v1ti,     __builtin_altivec_vsrdb_v1ti): New builtin definitions.     * config/rs6000/rs6000-overload.def (vec_sld, vec_sldb, vec_sldw,     vec_sll, vec_slo, vec_srdb, vec_srl, vec_sro): New overloaded     definitions.     * doc/extend.texi (vec_sld, vec_sldb, vec_sldw,    vec_sll, vec_slo,     vec_srdb, vec_srl, vec_sro): Add documentation for new overloaded     built-ins. gcc/testsuite/ChangeLog:     * gcc.target/powerpc/vec-shift-double-runnable-int128.c: New test file. ---  gcc/config/rs6000/altivec.md                  |   6 +-  gcc/config/rs6000/rs6000-builtins.def         |  12 +  gcc/config/rs6000/rs6000-overload.def         |  40 ++  gcc/doc/extend.texi                           |  43 +++  .../vec-shift-double-runnable-int128.c        | 358 ++++++++++++++++++  5 files changed, 456 insertions(+), 3 deletions(-)  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c + +  return 0; +} + +/* { dg-final { scan-assembler-times {\mvsrdbi\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mvsldbi\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mvsl\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mvsr\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mvslo\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mvsro\M} 4 } } */ diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md index 5af9bf920a2..2a18ee44526 100644 --- a/gcc/config/rs6000/altivec.md +++ b/gcc/config/rs6000/altivec.md @@ -878,9 +878,9 @@ (define_int_attr SLDB_lr [(UNSPEC_SLDB "l")  (define_int_iterator VSHIFT_DBL_LR [UNSPEC_SLDB UNSPEC_SRDB])  (define_insn "vsdb_" - [(set (match_operand:VI2 0 "register_operand" "=v") -  (unspec:VI2 [(match_operand:VI2 1 "register_operand" "v") -           (match_operand:VI2 2 "register_operand" "v") + [(set (match_operand:VEC_IC 0 "register_operand" "=v") +  (unspec:VEC_IC [(match_operand:VEC_IC 1 "register_operand" "v") +           (match_operand:VEC_IC 2 "register_operand" "v")             (match_operand:QI 3 "const_0_to_12_operand" "n")]            VSHIFT_DBL_LR))]    "TARGET_POWER10" diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def index 77eb0f7e406..a2b2b729270 100644 --- a/gcc/config/rs6000/rs6000-builtins.def +++ b/gcc/config/rs6000/rs6000-builtins.def @@ -964,6 +964,9 @@    const vss __builtin_altivec_vsldoi_8hi (vss, vss, const int<4>);      VSLDOI_8HI altivec_vsldoi_v8hi {} +  const vsq __builtin_altivec_vsldoi_v1ti (vsq, vsq, const int<4>); +    VSLDOI_V1TI altivec_vsldoi_v1ti {} +    const vss __builtin_altivec_vslh (vss, vus);      VSLH vashlv8hi3 {} @@ -1831,6 +1834,9 @@    const vsll __builtin_vsx_xxsldwi_2di (vsll, vsll, const int<2>);      XXSLDWI_2DI vsx_xxsldwi_v2di {} +  const vsq __builtin_vsx_xxsldwi_v1ti (vsq, vsq, const int<2>); +    XXSLDWI_1TI vsx_xxsldwi_v1ti {} +    const vf __builtin_vsx_xxsldwi_4sf (vf, vf, const int<2>);      XXSLDWI_4SF vsx_xxsldwi_v4sf {} @@ -3299,6 +3305,9 @@    const vss __builtin_altivec_vsldb_v8hi (vss, vss, const int<3>);      VSLDB_V8HI vsldb_v8hi {} +  const vsq __builtin_altivec_vsldb_v1ti (vsq, vsq, const int<3>); +    VSLDB_V1TI vsldb_v1ti {} +    const vsq __builtin_altivec_vslq (vsq, vuq);      VSLQ vashlv1ti3 {} @@ -3317,6 +3326,9 @@    const vss __builtin_altivec_vsrdb_v8hi (vss, vss, const int<3>);      VSRDB_V8HI vsrdb_v8hi {} +  const vsq __builtin_altivec_vsrdb_v1ti (vsq, vsq, const int<3>); +    VSRDB_V1TI vsrdb_v1ti {} +    const vsq __builtin_altivec_vsrq (vsq, vuq);      VSRQ vlshrv1ti3 {} diff --git a/gcc/config/rs6000/rs6000-overload.def b/gcc/config/rs6000/rs6000-overload.def index c4ecafc6f7e..96b0ecbd675 100644 --- a/gcc/config/rs6000/rs6000-overload.def +++ b/gcc/config/rs6000/rs6000-overload.def @@ -3399,6 +3399,10 @@      VSLDOI_4SF    vd __builtin_vec_sld (vd, vd, const int);      VSLDOI_2DF +  vsq __builtin_vec_sld (vsq, vsq, const int); +    VSLDOI_V1TI  VSLDOI_VSQ +  vuq __builtin_vec_sld (vuq, vuq, const int); +    VSLDOI_V1TI  VSLDOI_VUQ  [VEC_SLDB, vec_sldb, __builtin_vec_sldb]    vsc __builtin_vec_sldb (vsc, vsc, const int); @@ -3417,6 +3421,10 @@      VSLDB_V2DI  VSLDB_VSLL    vull __builtin_vec_sldb (vull, vull, const int);      VSLDB_V2DI  VSLDB_VULL +  vsq __builtin_vec_sldb (vsq, vsq, const int); +    VSLDB_V1TI  VSLDB_VSQ +  vuq __builtin_vec_sldb (vuq, vuq, const int); +    VSLDB_V1TI  VSLDB_VUQ  [VEC_SLDW, vec_sldw, __builtin_vec_sldw]    vsc __builtin_vec_sldw (vsc, vsc, const int); @@ -3439,6 +3447,10 @@      XXSLDWI_4SF  XXSLDWI_VF    vd __builtin_vec_sldw (vd, vd, const int);      XXSLDWI_2DF  XXSLDWI_VD +  vsq __builtin_vec_sldw (vsq, vsq, const int); +    XXSLDWI_1TI  XXSLDWI_VSQ +  vuq __builtin_vec_sldw (vuq, vuq, const int); +    XXSLDWI_1TI  XXSLDWI_VUQ  [VEC_SLL, vec_sll, __builtin_vec_sll]    vsc __builtin_vec_sll (vsc, vuc); @@ -3459,6 +3471,10 @@      VSL  VSL_VSLL    vull __builtin_vec_sll (vull, vuc);      VSL  VSL_VULL +  vsq __builtin_vec_sll (vsq, vuc); +    VSL  VSL_VSQ +  vuq __builtin_vec_sll (vuq, vuc); +    VSL  VSL_VUQ  ; The following variants are deprecated.    vsc __builtin_vec_sll (vsc, vus);      VSL  VSL_VSC_VUS @@ -3554,6 +3570,14 @@      VSLO  VSLO_VFS    vf __builtin_vec_slo (vf, vuc);      VSLO  VSLO_VFU +  vsq __builtin_vec_slo (vsq, vsc); +    VSLO  VSLDO_VSQS +  vsq __builtin_vec_slo (vsq, vuc); +    VSLO  VSLDO_VSQU +  vuq __builtin_vec_slo (vuq, vsc); +    VSLO  VSLDO_VUQS +  vuq __builtin_vec_slo (vuq, vuc); +    VSLO  VSLDO_VUQU  [VEC_SLV, vec_slv, __builtin_vec_vslv]    vuc __builtin_vec_vslv (vuc, vuc); @@ -3699,6 +3723,10 @@      VSRDB_V2DI  VSRDB_VSLL    vull __builtin_vec_srdb (vull, vull, const int);      VSRDB_V2DI  VSRDB_VULL +  vsq __builtin_vec_srdb (vsq, vsq, const int); +    VSRDB_V1TI  VSRDB_VSQ +  vuq __builtin_vec_srdb (vuq, vuq, const int); +    VSRDB_V1TI  VSRDB_VUQ  [VEC_SRL, vec_srl, __builtin_vec_srl]    vsc __builtin_vec_srl (vsc, vuc); @@ -3719,6 +3747,10 @@      VSR  VSR_VSLL    vull __builtin_vec_srl (vull, vuc);      VSR  VSR_VULL +  vsq __builtin_vec_srl (vsq, vuc); +    VSR  VSR_VSQ +  vuq __builtin_vec_srl (vuq, vuc); +    VSR  VSR_VUQ  ; The following variants are deprecated.    vsc __builtin_vec_srl (vsc, vus);      VSR  VSR_VSC_VUS @@ -3808,6 +3840,14 @@      VSRO  VSRO_VFS    vf __builtin_vec_sro (vf, vuc);      VSRO  VSRO_VFU +  vsq __builtin_vec_sro (vsq, vsc); +    VSRO  VSRDO_VSQS +  vsq __builtin_vec_sro (vsq, vuc); +    VSRO  VSRDO_VSQU +  vuq __builtin_vec_sro (vuq, vsc); +    VSRO  VSRDO_VUQS +  vuq __builtin_vec_sro (vuq, vuc); +    VSRO  VSRDO_VUQU  [VEC_SRV, vec_srv, __builtin_vec_vsrv]    vuc __builtin_vec_vsrv (vuc, vuc); diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 0b572afca72..83ff168faf6 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -23504,6 +23504,10 @@ const unsigned int);  vector signed long long, const unsigned int);  @exdent vector unsigned long long vec_sldb (vector unsigned long long,  vector unsigned long long, const unsigned int); +@exdent vector signed __int128 vec_sldb (vector signed __int128, +vector signed __int128, const unsigned int); +@exdent vector unsigned __int128 vec_sldb (vector unsigned __int128, +vector unsigned __int128, const unsigned int);  @end smallexample  Shift the combined input vectors left by the amount specified by the low-order @@ -23531,12 +23535,51 @@ const unsigned int);  vector signed long long, const unsigned int);  @exdent vector unsigned long long vec_srdb (vector unsigned long long,  vector unsigned long long, const unsigned int); +@exdent vector signed __int128 vec_srdb (vector signed __int128, +vector signed __int128, const unsigned int); +@exdent vector unsigned __int128 vec_srdb (vector unsigned __int128, +vector unsigned __int128, const unsigned int);  @end smallexample  Shift the combined input vectors right by the amount specified by the low-order  three bits of the third argument, and return the remaining 128 bits.  Code  using this built-in must be endian-aware. +@smallexample +@exdent vector signed __int128 vec_sld (vector signed __int128, +vector signed __int128, const unsigned int); +@exdent vector unsigned __int128 vec_sld (vector unsigned __int128, +vector unsigned __int128, const unsigned int); +@exdent vector signed __int128 vec_sldw (vector signed __int128, +vector signed __int128, const unsigned int); +@exdent vector unsigned __int128 vec_sldw (vector unsigned __int, +vector unsigned __int128, const unsigned int); +@exdent vector signed __int128 vec_slo (vector signed __int128, +vector signed char); +@exdent vector signed __int128 vec_slo (vector signed __int128, +vector unsigned char); +@exdent vector unsigned __int128 vec_slo (vector unsigned __int128, +vector signed char); +@exdent vector unsigned __int128 vec_slo (vector unsigned __int128, +vector unsigned char); +@exdent vector signed __int128 vec_sro (vector signed __int128, +vector signed char); +@exdent vector signed __int128 vec_sro (vector signed __int128, +vector unsigned char); +@exdent vector unsigned __int128 vec_sro (vector unsigned __int128, +vector signed char); +@exdent vector unsigned __int128 vec_sro (vector unsigned __int128, +vector unsigned char); +@exdent vector signed __int128 vec_srl (vector signed __int128, +vector unsigned char); +@exdent vector unsigned __int128 vec_srl (vector unsigned __int128, +vector unsigned char); +@end smallexample + +The above instances are extension of the existing overloaded built-ins +@code{vec_sld}, @code{vec_sldw}, @code{vec_slo}, @code{vec_sro}, @code{vec_srl} +that are documented in the PVIPR. +  @findex vec_srdb  Vector Splat diff --git a/gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c b/gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c new file mode 100644 index 00000000000..65e8e94ec07 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c @@ -0,0 +1,358 @@ +/* { dg-do run  { target power10_hw } } */ +/* { dg-do link { target { ! power10_hw } } } */ +/* { dg-require-effective-target power10_ok } */ + +/* Need -save-temps for dg-final scan-assembler-times at end of test.  */ +/* { dg-options "-mdejagnu-cpu=power10 -save-temps" } */ + +#include + +#define DEBUG 0 + +#if DEBUG +#include + +void print_i128 (unsigned __int128 val) +{ +  printf(" 0x%016llx%016llx", +     (unsigned long long)(val >> 64), +     (unsigned long long)(val & 0xFFFFFFFFFFFFFFFF)); +} +#endif + +extern void abort (void); + +#if DEBUG +#define ACTION_2ARG_UNSIGNED(NAME, TYPE_NAME)                \ +  printf ("vec_%s (vector TYPE __int128, vector TYPE __int128) \n", #NAME); \ +  printf(" src_va_s128[0] =      ");                    \ +  print_i128 ((unsigned __int128) src_va_##TYPE_NAME[0]); \ +  printf("\n");                            \ +  printf(" src_vb_uc =            0x");                \ +  for (i = 0; i < 16; i++)                         \ +    printf("%02x",  src_vb_uc[i]);                    \ +  printf("\n");                            \ +  printf(" vresult[0] =          ");                    \ +  print_i128 ((unsigned __int128) vresult[0]);                \ +  printf("\n");                            \ +  printf(" expected_vresult[0] = ");                    \ +  print_i128 ((unsigned __int128) expected_vresult[0]);        \ +  printf("\n"); + +#define ACTION_2ARG_SIGNED(NAME, TYPE_NAME)                \ +  printf ("vec_%s (vector TYPE __int128, vector TYPE __int128) \n", #NAME); \ +  printf(" src_va_s128[0] =      ");                    \ +  print_i128 ((unsigned __int128) src_va_##TYPE_NAME[0]); \ +  printf("\n");                            \ +  printf(" src_vb_sc =            0x");                \ +  for (i = 0; i < 16; i++)                         \ +    printf("%02x",  src_vb_sc[i]);                    \ +  printf("\n");                            \ +  printf(" vresult[0] =          ");                    \ +  print_i128 ((unsigned __int128) vresult[0]);                \ +  printf("\n");                            \ +  printf(" expected_vresult[0] = ");                    \ +  print_i128 ((unsigned __int128) expected_vresult[0]);        \ +  printf("\n"); + +#define ACTION_3ARG(NAME, TYPE_NAME, CONST)                \ +  printf ("vec_%s (vector TYPE __int128, vector TYPE __int128, %s) \n",    \ +    #NAME, #CONST);                            \ +  printf(" src_va_s128[0] =      ");                    \ +  print_i128 ((unsigned __int128) src_va_##TYPE_NAME[0]); \ +  printf("\n");                            \ +  printf(" src_vb_s128[0] =      ");                    \ +  print_i128 ((unsigned __int128) src_vb_##TYPE_NAME[0]); \ +  printf("\n");                            \ +  printf(" vresult[0] =          ");                    \ +  print_i128 ((unsigned __int128) vresult[0]);                \ +  printf("\n");                            \ +  printf(" expected_vresult[0] = ");                    \ +  print_i128 ((unsigned __int128) expected_vresult[0]);        \ +  printf("\n"); + +#else +#define ACTION_2ARG_UNSIGNED(NAME, TYPE_NAME)    \ +  abort(); + +#define ACTION_2ARG_SIGNED(NAME, TYPE_NAME)    \ +  abort(); + +#define ACTION_2ARG(NAME, TYPE_NAME)        \ +  abort(); + +#define ACTION_3ARG(NAME, TYPE_NAME, CONST)    \ +  abort(); +#endif + +/* Second argument of the builtin is vector unsigned char.  */ +#define TEST_2ARG_UNSIGNED(NAME, TYPE, TYPE_NAME, EXP_RESULT_HI, EXP_RESULT_LO) \ +  {                                    \ +    vector TYPE __int128 vresult;                    \ +    vector TYPE __int128 expected_vresult;                \ +    int i;                                \ +                                        \ +    expected_vresult = (vector TYPE __int128) { EXP_RESULT_HI };    \ +    expected_vresult = (expected_vresult << 64) |     \ +      (vector TYPE __int128) { EXP_RESULT_LO };            \ +    vresult = vec_##NAME (src_va_##TYPE_NAME, src_vb_uc); \ +                                    \ +    if (!vec_all_eq (vresult,  expected_vresult)) {            \ +      ACTION_2ARG_UNSIGNED(NAME, TYPE_NAME)                \ +    }                                    \ +  } + +/* Second argument of the builtin is vector signed char.  */ +#define TEST_2ARG_SIGNED(NAME, TYPE, TYPE_NAME, EXP_RESULT_HI, EXP_RESULT_LO) \ +  {                                    \ +    vector TYPE __int128 vresult;                    \ +    vector TYPE __int128 expected_vresult;                \ +    int i;                                \ +                                        \ +    expected_vresult = (vector TYPE __int128) { EXP_RESULT_HI };    \ +    expected_vresult = (expected_vresult << 64) |     \ +      (vector TYPE __int128) { EXP_RESULT_LO };            \ +    vresult = vec_##NAME (src_va_##TYPE_NAME, src_vb_sc); \ +                                    \ +    if (!vec_all_eq (vresult,  expected_vresult)) {            \ +      ACTION_2ARG_SIGNED(NAME, TYPE_NAME)                \ +    }                                    \ +  } + +#define TEST_3ARG(NAME, TYPE, TYPE_NAME, CONST, EXP_RESULT_HI, EXP_RESULT_LO) \ +  {                                    \ +    vector TYPE __int128 vresult;                    \ +    vector TYPE __int128 expected_vresult;                \ +                                        \ +    expected_vresult = (vector TYPE __int128) { EXP_RESULT_HI };    \ +    expected_vresult = (expected_vresult << 64) |     \ +      (vector TYPE __int128) { EXP_RESULT_LO };            \ +    vresult = vec_##NAME (src_va_##TYPE_NAME, src_vb_##TYPE_NAME, CONST);    \ +                                    \ +    if (!vec_all_eq (vresult,  expected_vresult)) {            \ +      ACTION_3ARG(NAME, TYPE_NAME, CONST)                \ +    }                                    \ +  } + +int +main (int argc, char *argv []) +{ +  vector signed __int128 vresult_s128; +  vector signed __int128 expected_vresult_s128; +  vector signed __int128 src_va_s128; +  vector signed __int128 src_vb_s128; +  vector unsigned __int128 vresult_u128; +  vector unsigned __int128 expected_vresult_u128; +  vector unsigned __int128 src_va_u128; +  vector unsigned __int128 src_vb_u128; +  vector signed char src_vb_sc; +  vector unsigned char src_vb_uc; + +  /* 128-bit vector shift right tests, vec_srdb. */ +  src_va_s128 = (vector signed __int128) {0x12345678}; +  src_vb_s128 = (vector signed __int128) {0xFEDCBA90}; +  TEST_3ARG(srdb, signed, s128, 4, 0x8000000000000000, 0xFEDCBA9) + +  src_va_u128 = (vector unsigned __int128) { 0xFEDCBA98 }; +  src_vb_u128 = (vector unsigned __int128) { 0x76543210}; +  TEST_3ARG(srdb, unsigned, u128, 4, 0x8000000000000000, 0x07654321) + +  /* 128-bit vector shift left tests, vec_sldb. */ +  src_va_s128 = (vector signed __int128) {0x123456789ABCDEF0}; +  src_va_s128 = (src_va_s128 << 64) +    | (vector signed __int128) {0x123456789ABCDEF0}; +  src_vb_s128 = (vector signed __int128) {0xFEDCBA9876543210}; +  src_vb_s128 = (src_vb_s128 << 64) +    | (vector signed __int128) {0xFEDCBA9876543210}; +  TEST_3ARG(sldb, signed, s128, 4, 0x23456789ABCDEF01, 0x23456789ABCDEF0F) + +  src_va_u128 = (vector unsigned __int128) {0xFEDCBA9876543210}; +  src_va_u128 = src_va_u128 << 64 +    | (vector unsigned __int128) {0xFEDCBA9876543210}; +  src_vb_u128 = (vector unsigned __int128) {0x123456789ABCDEF0}; +  src_vb_u128 = src_vb_u128 << 64 +    | (vector unsigned __int128) {0x123456789ABCDEF0}; +  TEST_3ARG(sldb, unsigned, u128, 4, 0xEDCBA9876543210F, 0xEDCBA98765432101) + +  /* Shift left by octect tests, vec_sld.  Shift is by immediate value +     times 8. */ +  src_va_s128 = (vector signed __int128) {0x123456789ABCDEF0}; +  src_va_s128 = (src_va_s128 << 64) +    | (vector signed __int128) {0x123456789ABCDEF0}; +  src_vb_s128 = (vector signed __int128) {0xFEDCBA9876543210}; +  src_vb_s128 = (src_vb_s128 << 64) +    | (vector signed __int128) {0xFEDCBA9876543210}; +  TEST_3ARG(sld, signed, s128, 4, 0x9abcdef012345678, 0x9abcdef0fedcba98) + +  src_va_u128 = (vector unsigned __int128) {0xFEDCBA9876543210}; +  src_va_u128 = src_va_u128 << 64 +    | (vector unsigned __int128) {0xFEDCBA9876543210}; +  src_vb_u128 = (vector unsigned __int128) {0x123456789ABCDEF0}; +  src_vb_u128 = src_vb_u128 << 64 +    | (vector unsigned __int128) {0x123456789ABCDEF0}; +  TEST_3ARG(sld, unsigned, u128, 4, 0x76543210fedcba98, 0x7654321012345678) + +  /* Vector left shift bytes within the vector, vec_sll. */ +  src_va_s128 = (vector signed __int128) {0x123456789ABCDEF0}; +  src_va_s128 = (src_va_s128 << 64) +    | (vector signed __int128) {0x123456789ABCDEF0}; +  src_vb_uc = (vector unsigned char) {0x01, 0x01, 0x01, 0x01, +                      0x01, 0x01, 0x01, 0x01, +                      0x01, 0x01, 0x01, 0x01, +                      0x01, 0x01, 0x01, 0x01}; +  TEST_2ARG_UNSIGNED(sll, signed, s128, 0x2468acf13579bde0, +             0x2468acf13579bde0) + +  src_va_u128 = (vector unsigned __int128) {0x123456789ABCDEF0}; +  src_va_u128 = src_va_u128 << 64 +    | (vector unsigned __int128) {0x123456789ABCDEF0}; +  src_vb_uc = (vector unsigned char) {0x02, 0x02, 0x02, 0x02, +                      0x02, 0x02, 0x02, 0x02, +                      0x02, 0x02, 0x02, 0x02, +                      0x02, 0x02, 0x02, 0x02}; +  TEST_2ARG_UNSIGNED(sll, unsigned, u128, 0x48d159e26af37bc0, +             0x48d159e26af37bc0) + +  /* Vector right shift bytes within the vector, vec_srl. */ +  src_va_s128 = (vector signed __int128) {0x123456789ABCDEF0}; +  src_va_s128 = (src_va_s128 << 64) +    | (vector signed __int128) {0x123456789ABCDEF0}; +  src_vb_uc = (vector unsigned char) {0x01, 0x01, 0x01, 0x01, +                      0x01, 0x01, 0x01, 0x01, +                      0x01, 0x01, 0x01, 0x01, +                      0x01, 0x01, 0x01, 0x01}; +  TEST_2ARG_UNSIGNED(srl, signed, s128, 0x091a2b3c4d5e6f78, +             0x091a2b3c4d5e6f78) + +  src_va_u128 = (vector unsigned __int128) {0x123456789ABCDEF0}; +  src_va_u128 = src_va_u128 << 64 +    | (vector unsigned __int128) {0x123456789ABCDEF0}; +  src_vb_uc = (vector unsigned char) {0x02, 0x02, 0x02, 0x02, +                      0x02, 0x02, 0x02, 0x02, +                      0x02, 0x02, 0x02, 0x02, +                      0x02, 0x02, 0x02, 0x02}; +  TEST_2ARG_UNSIGNED(srl, unsigned, u128, 0x48d159e26af37bc, +             0x48d159e26af37bc) + +  /* Shift left by octect tests, vec_slo.  Shift is by immediate value +     bytes.  Shift amount in bits 121:124.  */ +  src_va_s128 = (vector signed __int128) {0x123456789ABCDEF0}; +  src_va_s128 = (src_va_s128 << 64) +    | (vector signed __int128) {0x123456789ABCDEF0}; +  /* Note vb_sc is Endian specific, this is just LE.  */ +  /* The left shift amount is 1 byte, i.e. 1 * 8 bits.  */ +  src_vb_sc = (vector signed char) {0x1 << 3, 0x0, 0x0, 0x0, +                    0x0, 0x0, 0x0, 0x0, +                    0x0, 0x0, 0x0, 0x0, +                    0x0, 0x0, 0x0, 0x0}; + +  TEST_2ARG_SIGNED(slo, signed, s128, 0x3456789ABCDEF012, +           0x3456789ABCDEF000) +  src_va_s128 = (vector signed __int128) {0x123456789ABCDEF0}; +  src_va_s128 = (src_va_s128 << 64) +    | (vector signed __int128) {0x123456789ABCDEF0}; +  /* Note vb_sc is Endian specific, this is just LE.  */ +  /* The left shift amount is 2 bytes, i.e. 2 * 8 bits.  */ +  src_vb_uc = (vector unsigned char) {0x2 << 3, 0x0, 0x0, 0x0, +                      0x0, 0x0, 0x0, 0x0, +                      0x0, 0x0, 0x0, 0x0, +                      0x0, 0x0, 0x0, 0x0}; +  TEST_2ARG_UNSIGNED(slo, signed, s128, 0x56789ABCDEF01234, +             0x56789ABCDEF00000) + +  src_va_u128 = (vector unsigned __int128) {0xFEDCBA9876543210}; +  src_va_u128 = src_va_u128 << 64 +    | (vector unsigned __int128) {0xFEDCBA9876543210}; +  /* The left shift amount is 3 bytes, i.e. 3 * 8 bits.  */ +  src_vb_sc = (vector signed char) {0x03<<3, 0x0, 0x0, 0x0, +                    0x0, 0x0, 0x0, 0x0, +                    0x0, 0x0, 0x0, 0x0, +                    0x00, 0x00, 0x00, 0x0}; +  TEST_2ARG_SIGNED(slo, unsigned, u128, 0x9876543210FEDCBA, +               0x9876543210000000) + +  src_va_u128 = (vector unsigned __int128) {0xFEDCBA9876543210}; +  src_va_u128 = src_va_u128 << 64 +    | (vector unsigned __int128) {0xFEDCBA9876543210}; +  /* The left shift amount is 4 bytes, i.e. 4 * 8 bits.  */ +  src_vb_uc = (vector unsigned char) {0x04<<3, 0x0, 0x0, 0x0, +                      0x0, 0x0, 0x0, 0x0, +                      0x0, 0x0, 0x0, 0x0, +                      0x00, 0x00, 0x00, 0x0}; +  TEST_2ARG_UNSIGNED(slo, unsigned, u128, 0x76543210FEDCBA98, +               0x7654321000000000) + +  /* Shift right by octect tests, vec_sro.  Shift is by immediate value +     times 8.  Shift amount in bits 121:124.  */ +  src_va_s128 = (vector signed __int128) {0x123456789ABCDEF0}; +  src_va_s128 = (src_va_s128 << 64) +    | (vector signed __int128) {0x123456789ABCDEF0}; +  /* Note vb_sc is Endian specific, this is just LE.  */ +  /* The left shift amount is 1 byte, i.e. 1 * 8 bits.  */ +  src_vb_sc = (vector signed char) {0x1 << 3, 0x0, 0x0, 0x0, +                    0x0, 0x0, 0x0, 0x0, +                    0x0, 0x0, 0x0, 0x0, +                    0x0, 0x0, 0x0, 0x0}; +  TEST_2ARG_SIGNED(sro, signed, s128, 0x00123456789ABCDE, 0xF0123456789ABCDE) + +  src_va_s128 = (vector signed __int128) {0x123456789ABCDEF0}; +  src_va_s128 = (src_va_s128 << 64) +    | (vector signed __int128) {0x123456789ABCDEF0}; +  /* Note vb_sc is Endian specific, this is just LE.  */ +  /* The left shift amount is 1 byte, i.e. 1 * 8 bits.  */ +  src_vb_uc = (vector unsigned char) {0x2 << 3, 0x0, 0x0, 0x0, +                      0x0, 0x0, 0x0, 0x0, +                      0x0, 0x0, 0x0, 0x0, +                      0x0, 0x0, 0x0, 0x0}; +  TEST_2ARG_UNSIGNED(sro, signed, s128, 0x0000123456789ABC, +             0xDEF0123456789ABC) + +  src_va_u128 = (vector unsigned __int128) {0xFEDCBA9876543210}; +  src_va_u128 = src_va_u128 << 64 +    | (vector unsigned __int128) {0xFEDCBA9876543210}; +  /* The left shift amount is 4 bytes, i.e. 4 * 8 bits.  */ +  src_vb_sc = (vector signed char) {0x03<<3, 0x0, 0x0, 0x0, +                    0x0, 0x0, 0x0, 0x0, +                    0x0, 0x0, 0x0, 0x0, +                    0x00, 0x00, 0x00, 0x0}; +  TEST_2ARG_SIGNED(sro, unsigned, u128, 0x000000FEDCBA9876, +           0x543210FEDCBA9876) + +  src_va_u128 = (vector unsigned __int128) {0xFEDCBA9876543210}; +  src_va_u128 = src_va_u128 << 64 +    | (vector unsigned __int128) {0xFEDCBA9876543210}; +  /* The left shift amount is 4 bytes, i.e. 4 * 8 bits.  */ +  src_vb_uc = (vector unsigned char) {0x04<<3, 0x0, 0x0, 0x0, +                      0x0, 0x0, 0x0, 0x0, +                      0x0, 0x0, 0x0, 0x0, +                      0x00, 0x00, 0x00, 0x0}; +  TEST_2ARG_UNSIGNED(sro, unsigned, u128, 0x00000000FEDCBA98, +               0x76543210FEDCBA98) + +  /* 128-bit vector shift left tests, vec_sldw. */ +  src_va_s128 = (vector signed __int128) {0x123456789ABCDEF0}; +  src_va_s128 = (src_va_s128 << 64) +    | (vector signed __int128) {0x123456789ABCDEF0}; +  src_vb_s128 = (vector signed __int128) {0xFEDCBA9876543210}; +  src_vb_s128 = (src_vb_s128 << 64) +    | (vector signed __int128) {0xFEDCBA9876543210}; +  TEST_3ARG(sldw, signed, s128, 1, 0x9ABCDEF012345678, 0x9ABCDEF0FEDCBA98) + +  src_va_u128 = (vector unsigned __int128) {0x123456789ABCDEF0}; +  src_va_u128 = (src_va_u128 << 64) +    | (vector unsigned __int128) {0x123456789ABCDEF0}; +  src_vb_u128 = (vector unsigned __int128) {0xFEDCBA9876543210}; +  src_vb_u128 = (src_vb_u128 << 64) +    | (vector unsigned __int128) {0xFEDCBA9876543210}; +  TEST_3ARG(sldw, unsigned, u128, 2, 0x123456789ABCDEF0, 0xFEDCBA9876543210) +