From patchwork Mon Nov 13 16:30:09 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bill Schmidt X-Patchwork-Id: 837472 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-466654-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="al0azcmS"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3ybGN93t8Fz9rxl for ; Tue, 14 Nov 2017 03:30:48 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to:cc :from:subject:date:mime-version:content-type :content-transfer-encoding:message-id; q=dns; s=default; b=MlWN8 v9+45fx3YpmYa5idOPdFHBDbIE9NIwVaNKp19/ZHroeR4vN/Ew1VcO5+OyDB4ZoI XgdCjNM+BZxDDUyYNpEgeyrrZUtN8ARGKxraSmfXcYmyuHFHu7WqXfrrIlFftOXq 1z0GX66S1gLumPlhtAO7qiFKxdyFmNh4RXTz6k= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to:cc :from:subject:date:mime-version:content-type :content-transfer-encoding:message-id; s=default; bh=H6SB4jsCmqR T2Gef8gr893utTjI=; b=al0azcmS3w4QAy6hWu8VIddUD8XppdytMf26L2T+qbD 5xz9vYM2luntFgdcwSLKrJi3EDu1ZN3YDOu262hz+J9oSzRVK63CddK/5ZYCYaQD 6kfV2gi2bV9RD/6JuWWXaOUBjxJaYXoM/EBXv8K+ep3m2urtIn137llTEV916p04 = Received: (qmail 129593 invoked by alias); 13 Nov 2017 16:30:37 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 124472 invoked by uid 89); 13 Nov 2017 16:30:34 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-10.5 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, KAM_LAZY_DOMAIN_SECURITY, RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.2 spammy=stxvd2x, Schmidt, perm, H*UA:Intel X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0a-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.156.1) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 13 Nov 2017 16:30:24 +0000 Received: from pps.filterd (m0098404.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.21/8.16.0.21) with SMTP id vADGTEbs113458 for ; Mon, 13 Nov 2017 11:30:14 -0500 Received: from e37.co.us.ibm.com (e37.co.us.ibm.com [32.97.110.158]) by mx0a-001b2d01.pphosted.com with ESMTP id 2e7cty8nqd-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Mon, 13 Nov 2017 11:30:13 -0500 Received: from localhost by e37.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 13 Nov 2017 09:30:12 -0700 Received: from b03cxnp08026.gho.boulder.ibm.com (9.17.130.18) by e37.co.us.ibm.com (192.168.1.137) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Mon, 13 Nov 2017 09:30:10 -0700 Received: from b03ledav002.gho.boulder.ibm.com (b03ledav002.gho.boulder.ibm.com [9.17.130.233]) by b03cxnp08026.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id vADGU2sP2097444; Mon, 13 Nov 2017 09:30:10 -0700 Received: from b03ledav002.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id EDAA8136043; Mon, 13 Nov 2017 09:30:09 -0700 (MST) Received: from bigmac.rchland.ibm.com (unknown [9.10.86.195]) by b03ledav002.gho.boulder.ibm.com (Postfix) with ESMTP id A209413603C; Mon, 13 Nov 2017 09:30:09 -0700 (MST) To: GCC Patches Cc: Segher Boessenkool , David Edelsohn , cel@linux.vnet.ibm.com From: Bill Schmidt Subject: [PATCH, rs6000] Repair vec_xl, vec_xst, vec_xl_be, vec_xst_be built-in functions Date: Mon, 13 Nov 2017 10:30:09 -0600 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 MIME-Version: 1.0 X-TM-AS-GCONF: 00 x-cbid: 17111316-0024-0000-0000-0000177A8600 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00008064; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000240; SDB=6.00945327; UDB=6.00477073; IPR=6.00725624; BA=6.00005689; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00017993; XFM=3.00000015; UTC=2017-11-13 16:30:11 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17111316-0025-0000-0000-00004D7BE3C6 Message-Id: <49ed24c9-7edb-428a-a960-fb7dc443d82a@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2017-11-13_09:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1707230000 definitions=main-1711130230 X-IsSubscribed: yes Hi, Some previous patches to add support for vec_xl_be and fill in gaps in testing for vec_xl and vec_xst resulted in incorrect semantics for these built-in functions. My fault for not reviewing them in detail. Essentially the vec_xl and vec_xl_be semantics were reversed, and vec_xst received semantics that should be associated with vec_xst_be. Also, vec_xst_be has not yet been implemented. Also, my attempt to add P8-specific code for the element- reversing loads and stores had the same problem with wrong semantics. This patch addresses these issues in a uniform manner. Most of the work is done by adjusting the rs6000-c.c mapping between overloaded function names and type-specific implementations, and adding missing interfaces there. I've also removed the previous implementation function altivec_expand_xl_be_builtin, and moved the byte-reversal into define_expands for the code gen patterns with some simplification. These generate single instructions for P9 (lxvh8x, lxvq16x, etc.) and lvxw4x + permute for P8. I've verified the code generation by hand, but have not included test cases in this patch because Carl Love has been independently working on adding a full set of test cases for these built-ins. The code does pass those relevant tests that are already in the test suite. I hope it's okay to install this patch while waiting for the full tests; I'll of course give high priority to fixing any possible fallout from those tests. Bootstrapped and tested on powerpc64le-linux-gnu (POWER8 and POWER9) with no regressions. Is this okay for trunk? Thanks, Bill [gcc] 2017-11-13 Bill Schmidt * config/rs6000/altivec.h (vec_xst_be): New #define. * config/rs6000/altivec.md (altivec_vperm__direct): Rename and externalize from *altivec_vperm__internal. * config/rs6000/rs6000-builtin.def (XL_BE_V16QI): Remove macro instantiation. (XL_BE_V8HI): Likewise. (XL_BE_V4SI): Likewise. (XL_BE_V4SI): Likewise. (XL_BE_V2DI): Likewise. (XL_BE_V4SF): Likewise. (XL_BE_V2DF): Likewise. (XST_BE): Add BU_VSX_OVERLOAD_X macro instantiation. * config/rss6000/rs6000-c.c (altivec_overloaded_builtins): Correct all array entries with these keys: VSX_BUILTIN_VEC_XL, VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_VEC_XST. Add entries for key VSX_BUILTIN_VEC_XST_BE. (altivec_expand_xl_be_builtin): Remove. (altivec_expand_builtin): Remove handling for VSX_BUILTIN_XL_BE_* built-ins. Replace conditional calls to def_builtin for __builtin_vsx_ld_elemrev_{v8hi,v16qi} and __builtin_vsx_st_elemrev_{v8hi,v16qi} based on TARGET_P9_VECTOR with unconditional calls. Remove calls to def_builtin for __builtin_vsx_le_be_. Add a call to def_builtin for __builtin_vec_xst_be. * config/rs6000/vsx.md (vsx_ld_elemrev_v8hi): Convert define_insn to define_expand, and add alternate RTL generation for P8. (*vsx_ld_elemrev_v8hi_internal): New define_insn based on vsx_ld_elemrev_v8hi. (vsx_ld_elemrev_v16qi): Convert define_insn to define_expand, and add alternate RTL generation for P8. (*vsx_ld_elemrev_v16qi_internal): New define_insn based on vsx_ld_elemrev_v16qi. (vsx_st_elemrev_v8hi): Convert define_insn to define_expand, and add alternate RTL generation for P8. (*vsx_st_elemrev_v8hi_internal): New define_insn based on vsx_st_elemrev_v8hi. (vsx_st_elemrev_v16qi): Convert define_insn to define_expand, and add alternate RTL generation for P8. (*vsx_st_elemrev_v16qi_internal): New define_insn based on vsx_st_elemrev_v16qi. [gcc/testsuite] 2017-11-13 Bill Schmidt * gcc.target/powerpc/swaps-p8-26.c: Modify expected code generation. Index: gcc/config/rs6000/altivec.h =================================================================== --- gcc/config/rs6000/altivec.h (revision 254629) +++ gcc/config/rs6000/altivec.h (working copy) @@ -357,6 +357,7 @@ #define vec_xl __builtin_vec_vsx_ld #define vec_xl_be __builtin_vec_xl_be #define vec_xst __builtin_vec_vsx_st +#define vec_xst_be __builtin_vec_xst_be /* Note, xxsldi and xxpermdi were added as __builtin_vsx_ functions instead of __builtin_vec_ */ Index: gcc/config/rs6000/altivec.md =================================================================== --- gcc/config/rs6000/altivec.md (revision 254629) +++ gcc/config/rs6000/altivec.md (working copy) @@ -2130,7 +2130,7 @@ }) ;; Slightly prefer vperm, since the target does not overlap the source -(define_insn "*altivec_vperm__internal" +(define_insn "altivec_vperm__direct" [(set (match_operand:VM 0 "register_operand" "=v,?wo") (unspec:VM [(match_operand:VM 1 "register_operand" "v,wo") (match_operand:VM 2 "register_operand" "v,0") Index: gcc/config/rs6000/rs6000-builtin.def =================================================================== --- gcc/config/rs6000/rs6000-builtin.def (revision 254629) +++ gcc/config/rs6000/rs6000-builtin.def (working copy) @@ -1728,14 +1728,6 @@ BU_VSX_X (LXVW4X_V4SF, "lxvw4x_v4sf", MEM) BU_VSX_X (LXVW4X_V4SI, "lxvw4x_v4si", MEM) BU_VSX_X (LXVW4X_V8HI, "lxvw4x_v8hi", MEM) BU_VSX_X (LXVW4X_V16QI, "lxvw4x_v16qi", MEM) - -BU_VSX_X (XL_BE_V16QI, "xl_be_v16qi", MEM) -BU_VSX_X (XL_BE_V8HI, "xl_be_v8hi", MEM) -BU_VSX_X (XL_BE_V4SI, "xl_be_v4si", MEM) -BU_VSX_X (XL_BE_V2DI, "xl_be_v2di", MEM) -BU_VSX_X (XL_BE_V4SF, "xl_be_v4sf", MEM) -BU_VSX_X (XL_BE_V2DF, "xl_be_v2df", MEM) - BU_VSX_X (STXSDX, "stxsdx", MEM) BU_VSX_X (STXVD2X_V1TI, "stxvd2x_v1ti", MEM) BU_VSX_X (STXVD2X_V2DF, "stxvd2x_v2df", MEM) @@ -1838,6 +1830,7 @@ BU_VSX_OVERLOAD_X (ST, "st") BU_VSX_OVERLOAD_X (XL, "xl") BU_VSX_OVERLOAD_X (XL_BE, "xl_be") BU_VSX_OVERLOAD_X (XST, "xst") +BU_VSX_OVERLOAD_X (XST_BE, "xst_be") /* 1 argument builtins pre ISA 2.04. */ BU_FP_MISC_1 (FCTID, "fctid", CONST, lrintdfdi2) Index: gcc/config/rs6000/rs6000-c.c =================================================================== --- gcc/config/rs6000/rs6000-c.c (revision 254629) +++ gcc/config/rs6000/rs6000-c.c (working copy) @@ -3055,69 +3055,94 @@ const struct altivec_builtin_types altivec_overloa RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 }, { ALTIVEC_BUILTIN_VEC_SUMS, ALTIVEC_BUILTIN_VSUMSWS, RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 }, - { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V2DF, + + { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVD2X_V2DF, RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_V2DF, 0 }, - { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V2DF, + { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVD2X_V2DF, RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_double, 0 }, - { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V2DI, + { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVD2X_V2DI, RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_V2DI, 0 }, - { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V2DI, + { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVD2X_V2DI, RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_long_long, 0 }, - { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V2DI, + { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVD2X_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_unsigned_V2DI, 0 }, - { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V2DI, + { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVD2X_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_unsigned_long_long, 0 }, - { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V4SF, + { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVW4X_V4SF, RS6000_BTI_V4SF, RS6000_BTI_INTSI, ~RS6000_BTI_V4SF, 0 }, - { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V4SF, + { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVW4X_V4SF, RS6000_BTI_V4SF, RS6000_BTI_INTSI, ~RS6000_BTI_float, 0 }, - { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V4SI, + { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVW4X_V4SI, RS6000_BTI_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_V4SI, 0 }, - { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V4SI, + { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVW4X_V4SI, RS6000_BTI_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_INTSI, 0 }, - { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V4SI, + { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVW4X_V4SI, RS6000_BTI_unsigned_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_unsigned_V4SI, 0 }, - { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V4SI, + { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVW4X_V4SI, RS6000_BTI_unsigned_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTSI, 0 }, - { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V8HI, + { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVW4X_V8HI, RS6000_BTI_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_V8HI, 0 }, - { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V8HI, + { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVW4X_V8HI, RS6000_BTI_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_INTHI, 0 }, - { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V8HI, + { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVW4X_V8HI, RS6000_BTI_unsigned_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_unsigned_V8HI, 0 }, - { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V8HI, + { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVW4X_V8HI, RS6000_BTI_unsigned_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTHI, 0 }, - { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V16QI, + { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVW4X_V16QI, RS6000_BTI_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_V16QI, 0 }, - { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V16QI, + { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVW4X_V16QI, RS6000_BTI_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_INTQI, 0 }, - { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V16QI, + { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVW4X_V16QI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_unsigned_V16QI, 0 }, - { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LD_ELEMREV_V16QI, + { VSX_BUILTIN_VEC_XL, VSX_BUILTIN_LXVW4X_V16QI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTQI, 0 }, - { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_XL_BE_V16QI, - RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTQI, 0 }, - { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_XL_BE_V16QI, - RS6000_BTI_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_INTQI, 0 }, - { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_XL_BE_V8HI, + + { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_LD_ELEMREV_V2DF, + RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_V2DF, 0 }, + { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_LD_ELEMREV_V2DF, + RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_double, 0 }, + { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_LD_ELEMREV_V2DI, + RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_V2DI, 0 }, + { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_LD_ELEMREV_V2DI, + RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_long_long, 0 }, + { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_LD_ELEMREV_V2DI, + RS6000_BTI_unsigned_V2DI, RS6000_BTI_INTSI, + ~RS6000_BTI_unsigned_V2DI, 0 }, + { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_LD_ELEMREV_V2DI, + RS6000_BTI_unsigned_V2DI, RS6000_BTI_INTSI, + ~RS6000_BTI_unsigned_long_long, 0 }, + { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_LD_ELEMREV_V4SF, + RS6000_BTI_V4SF, RS6000_BTI_INTSI, ~RS6000_BTI_V4SF, 0 }, + { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_LD_ELEMREV_V4SF, + RS6000_BTI_V4SF, RS6000_BTI_INTSI, ~RS6000_BTI_float, 0 }, + { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_LD_ELEMREV_V4SI, + RS6000_BTI_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_V4SI, 0 }, + { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_LD_ELEMREV_V4SI, + RS6000_BTI_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_INTSI, 0 }, + { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_LD_ELEMREV_V4SI, + RS6000_BTI_unsigned_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_unsigned_V4SI, 0 }, + { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_LD_ELEMREV_V4SI, + RS6000_BTI_unsigned_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTSI, 0 }, + { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_LD_ELEMREV_V8HI, + RS6000_BTI_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_V8HI, 0 }, + { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_LD_ELEMREV_V8HI, RS6000_BTI_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_INTHI, 0 }, - { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_XL_BE_V8HI, + { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_LD_ELEMREV_V8HI, + RS6000_BTI_unsigned_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_unsigned_V8HI, 0 }, + { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_LD_ELEMREV_V8HI, RS6000_BTI_unsigned_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTHI, 0 }, - { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_XL_BE_V4SI, - RS6000_BTI_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_INTSI, 0 }, - { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_XL_BE_V4SI, - RS6000_BTI_unsigned_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTSI, 0 }, - { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_XL_BE_V2DI, - RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_long_long, 0 }, - { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_XL_BE_V2DI, - RS6000_BTI_unsigned_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_unsigned_long_long, 0 }, - { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_XL_BE_V4SF, - RS6000_BTI_V4SF, RS6000_BTI_INTSI, ~RS6000_BTI_float, 0 }, - { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_XL_BE_V2DF, - RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_double, 0 }, + { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_LD_ELEMREV_V16QI, + RS6000_BTI_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_V16QI, 0 }, + { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_LD_ELEMREV_V16QI, + RS6000_BTI_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_INTQI, 0 }, + { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_LD_ELEMREV_V16QI, + RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, + ~RS6000_BTI_unsigned_V16QI, 0 }, + { VSX_BUILTIN_VEC_XL_BE, VSX_BUILTIN_LD_ELEMREV_V16QI, + RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTQI, 0 }, { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR, RS6000_BTI_V4SF, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 }, { ALTIVEC_BUILTIN_VEC_XOR, ALTIVEC_BUILTIN_VXOR, @@ -3893,53 +3918,111 @@ const struct altivec_builtin_types altivec_overloa RS6000_BTI_void, RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_unsigned_V16QI }, { ALTIVEC_BUILTIN_VEC_STVRXL, ALTIVEC_BUILTIN_STVRXL, RS6000_BTI_void, RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTQI }, - { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V2DF, + { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVD2X_V2DF, RS6000_BTI_void, RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_V2DF }, - { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V2DF, + { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVD2X_V2DI, + RS6000_BTI_void, RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_V2DI }, + { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVD2X_V2DI, + RS6000_BTI_void, RS6000_BTI_unsigned_V2DI, RS6000_BTI_INTSI, + ~RS6000_BTI_unsigned_V2DI }, + { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVD2X_V2DI, + RS6000_BTI_void, RS6000_BTI_bool_V2DI, RS6000_BTI_INTSI, + ~RS6000_BTI_bool_V2DI }, + { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVW4X_V4SF, + RS6000_BTI_void, RS6000_BTI_V4SF, RS6000_BTI_INTSI, ~RS6000_BTI_V4SF }, + { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVW4X_V4SF, + RS6000_BTI_void, RS6000_BTI_V4SF, RS6000_BTI_INTSI, ~RS6000_BTI_float }, + { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVW4X_V4SI, + RS6000_BTI_void, RS6000_BTI_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_V4SI }, + { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVW4X_V4SI, + RS6000_BTI_void, RS6000_BTI_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_INTSI }, + { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVW4X_V4SI, + RS6000_BTI_void, RS6000_BTI_unsigned_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_unsigned_V4SI }, + { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVW4X_V4SI, + RS6000_BTI_void, RS6000_BTI_unsigned_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTSI }, + { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVW4X_V4SI, + RS6000_BTI_void, RS6000_BTI_bool_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_bool_V4SI }, + { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVW4X_V4SI, + RS6000_BTI_void, RS6000_BTI_bool_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTSI }, + { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVW4X_V4SI, + RS6000_BTI_void, RS6000_BTI_bool_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_INTSI }, + { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVW4X_V8HI, + RS6000_BTI_void, RS6000_BTI_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_V8HI }, + { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVW4X_V8HI, + RS6000_BTI_void, RS6000_BTI_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_INTHI }, + { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVW4X_V8HI, + RS6000_BTI_void, RS6000_BTI_unsigned_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_unsigned_V8HI }, + { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVW4X_V8HI, + RS6000_BTI_void, RS6000_BTI_unsigned_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTHI }, + { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVW4X_V8HI, + RS6000_BTI_void, RS6000_BTI_bool_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_bool_V8HI }, + { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVW4X_V8HI, + RS6000_BTI_void, RS6000_BTI_bool_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTHI }, + { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVW4X_V8HI, + RS6000_BTI_void, RS6000_BTI_bool_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_INTHI }, + { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVW4X_V16QI, + RS6000_BTI_void, RS6000_BTI_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_V16QI }, + { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVW4X_V16QI, + RS6000_BTI_void, RS6000_BTI_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_INTQI }, + { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVW4X_V16QI, + RS6000_BTI_void, RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_unsigned_V16QI }, + { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVW4X_V16QI, + RS6000_BTI_void, RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTQI }, + { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVW4X_V16QI, + RS6000_BTI_void, RS6000_BTI_bool_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_bool_V16QI }, + { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVW4X_V16QI, + RS6000_BTI_void, RS6000_BTI_bool_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTQI }, + { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVW4X_V16QI, + RS6000_BTI_void, RS6000_BTI_bool_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_INTQI }, + { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_STXVW4X_V8HI, + RS6000_BTI_void, RS6000_BTI_pixel_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_pixel_V8HI }, + { VSX_BUILTIN_VEC_XST_BE, VSX_BUILTIN_ST_ELEMREV_V2DF, + RS6000_BTI_void, RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_V2DF }, + { VSX_BUILTIN_VEC_XST_BE, VSX_BUILTIN_ST_ELEMREV_V2DF, RS6000_BTI_void, RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_double }, - { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V2DI, + { VSX_BUILTIN_VEC_XST_BE, VSX_BUILTIN_ST_ELEMREV_V2DI, RS6000_BTI_void, RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_V2DI }, - { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V2DI, + { VSX_BUILTIN_VEC_XST_BE, VSX_BUILTIN_ST_ELEMREV_V2DI, RS6000_BTI_void, RS6000_BTI_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_long_long }, - { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V2DI, + { VSX_BUILTIN_VEC_XST_BE, VSX_BUILTIN_ST_ELEMREV_V2DI, RS6000_BTI_void, RS6000_BTI_unsigned_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_unsigned_V2DI }, - { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V2DI, + { VSX_BUILTIN_VEC_XST_BE, VSX_BUILTIN_ST_ELEMREV_V2DI, RS6000_BTI_void, RS6000_BTI_unsigned_V2DI, RS6000_BTI_INTSI, ~RS6000_BTI_unsigned_long_long }, - { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V4SF, + { VSX_BUILTIN_VEC_XST_BE, VSX_BUILTIN_ST_ELEMREV_V4SF, RS6000_BTI_void, RS6000_BTI_V4SF, RS6000_BTI_INTSI, ~RS6000_BTI_V4SF }, - { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V4SF, + { VSX_BUILTIN_VEC_XST_BE, VSX_BUILTIN_ST_ELEMREV_V4SF, RS6000_BTI_void, RS6000_BTI_V4SF, RS6000_BTI_INTSI, ~RS6000_BTI_float }, - { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V4SI, + { VSX_BUILTIN_VEC_XST_BE, VSX_BUILTIN_ST_ELEMREV_V4SI, RS6000_BTI_void, RS6000_BTI_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_V4SI }, - { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V4SI, + { VSX_BUILTIN_VEC_XST_BE, VSX_BUILTIN_ST_ELEMREV_V4SI, RS6000_BTI_void, RS6000_BTI_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_INTSI }, - { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V4SI, + { VSX_BUILTIN_VEC_XST_BE, VSX_BUILTIN_ST_ELEMREV_V4SI, RS6000_BTI_void, RS6000_BTI_unsigned_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_unsigned_V4SI }, - { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V4SI, + { VSX_BUILTIN_VEC_XST_BE, VSX_BUILTIN_ST_ELEMREV_V4SI, RS6000_BTI_void, RS6000_BTI_unsigned_V4SI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTSI }, - { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V8HI, + { VSX_BUILTIN_VEC_XST_BE, VSX_BUILTIN_ST_ELEMREV_V8HI, RS6000_BTI_void, RS6000_BTI_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_V8HI }, - { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V8HI, + { VSX_BUILTIN_VEC_XST_BE, VSX_BUILTIN_ST_ELEMREV_V8HI, RS6000_BTI_void, RS6000_BTI_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_INTHI }, - { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V8HI, + { VSX_BUILTIN_VEC_XST_BE, VSX_BUILTIN_ST_ELEMREV_V8HI, RS6000_BTI_void, RS6000_BTI_unsigned_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_unsigned_V8HI }, - { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V8HI, + { VSX_BUILTIN_VEC_XST_BE, VSX_BUILTIN_ST_ELEMREV_V8HI, RS6000_BTI_void, RS6000_BTI_unsigned_V8HI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTHI }, - { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V16QI, + { VSX_BUILTIN_VEC_XST_BE, VSX_BUILTIN_ST_ELEMREV_V16QI, RS6000_BTI_void, RS6000_BTI_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_V16QI }, - { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V16QI, + { VSX_BUILTIN_VEC_XST_BE, VSX_BUILTIN_ST_ELEMREV_V16QI, RS6000_BTI_void, RS6000_BTI_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_INTQI }, - { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V16QI, + { VSX_BUILTIN_VEC_XST_BE, VSX_BUILTIN_ST_ELEMREV_V16QI, RS6000_BTI_void, RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_unsigned_V16QI }, - { VSX_BUILTIN_VEC_XST, VSX_BUILTIN_ST_ELEMREV_V16QI, + { VSX_BUILTIN_VEC_XST_BE, VSX_BUILTIN_ST_ELEMREV_V16QI, RS6000_BTI_void, RS6000_BTI_unsigned_V16QI, RS6000_BTI_INTSI, ~RS6000_BTI_UINTQI }, { VSX_BUILTIN_VEC_XXSLDWI, VSX_BUILTIN_XXSLDWI_16QI, Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (revision 254629) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -14510,58 +14510,6 @@ altivec_expand_lv_builtin (enum insn_code icode, t } static rtx -altivec_expand_xl_be_builtin (enum insn_code icode, tree exp, rtx target, bool blk) -{ - rtx pat, addr; - tree arg0 = CALL_EXPR_ARG (exp, 0); - tree arg1 = CALL_EXPR_ARG (exp, 1); - machine_mode tmode = insn_data[icode].operand[0].mode; - machine_mode mode0 = Pmode; - machine_mode mode1 = Pmode; - rtx op0 = expand_normal (arg0); - rtx op1 = expand_normal (arg1); - - if (icode == CODE_FOR_nothing) - /* Builtin not supported on this processor. */ - return 0; - - /* If we got invalid arguments bail out before generating bad rtl. */ - if (arg0 == error_mark_node || arg1 == error_mark_node) - return const0_rtx; - - if (target == 0 - || GET_MODE (target) != tmode - || ! (*insn_data[icode].operand[0].predicate) (target, tmode)) - target = gen_reg_rtx (tmode); - - op1 = copy_to_mode_reg (mode1, op1); - - if (op0 == const0_rtx) - addr = gen_rtx_MEM (blk ? BLKmode : tmode, op1); - else - { - op0 = copy_to_mode_reg (mode0, op0); - addr = gen_rtx_MEM (blk ? BLKmode : tmode, - gen_rtx_PLUS (Pmode, op1, op0)); - } - - pat = GEN_FCN (icode) (target, addr); - if (!pat) - return 0; - - emit_insn (pat); - /* Reverse element order of elements if in LE mode */ - if (!VECTOR_ELT_ORDER_BIG) - { - rtx sel = swap_selector_for_mode (tmode); - rtx vperm = gen_rtx_UNSPEC (tmode, gen_rtvec (3, target, target, sel), - UNSPEC_VPERM); - emit_insn (gen_rtx_SET (target, vperm)); - } - return target; -} - -static rtx paired_expand_stv_builtin (enum insn_code icode, tree exp) { tree arg0 = CALL_EXPR_ARG (exp, 0); @@ -15957,50 +15905,6 @@ altivec_expand_builtin (tree exp, rtx target, bool /* Fall through. */ } - /* XL_BE We initialized them to always load in big endian order. */ - switch (fcode) - { - case VSX_BUILTIN_XL_BE_V2DI: - { - enum insn_code code = CODE_FOR_vsx_load_v2di; - return altivec_expand_xl_be_builtin (code, exp, target, false); - } - break; - case VSX_BUILTIN_XL_BE_V4SI: - { - enum insn_code code = CODE_FOR_vsx_load_v4si; - return altivec_expand_xl_be_builtin (code, exp, target, false); - } - break; - case VSX_BUILTIN_XL_BE_V8HI: - { - enum insn_code code = CODE_FOR_vsx_load_v8hi; - return altivec_expand_xl_be_builtin (code, exp, target, false); - } - break; - case VSX_BUILTIN_XL_BE_V16QI: - { - enum insn_code code = CODE_FOR_vsx_load_v16qi; - return altivec_expand_xl_be_builtin (code, exp, target, false); - } - break; - case VSX_BUILTIN_XL_BE_V2DF: - { - enum insn_code code = CODE_FOR_vsx_load_v2df; - return altivec_expand_xl_be_builtin (code, exp, target, false); - } - break; - case VSX_BUILTIN_XL_BE_V4SF: - { - enum insn_code code = CODE_FOR_vsx_load_v4sf; - return altivec_expand_xl_be_builtin (code, exp, target, false); - } - break; - default: - break; - /* Fall through. */ - } - *expandedp = false; return NULL_RTX; } @@ -17618,6 +17522,10 @@ altivec_init_builtins (void) VSX_BUILTIN_LD_ELEMREV_V4SF); def_builtin ("__builtin_vsx_ld_elemrev_v4si", v4si_ftype_long_pcvoid, VSX_BUILTIN_LD_ELEMREV_V4SI); + def_builtin ("__builtin_vsx_ld_elemrev_v8hi", v8hi_ftype_long_pcvoid, + VSX_BUILTIN_LD_ELEMREV_V8HI); + def_builtin ("__builtin_vsx_ld_elemrev_v16qi", v16qi_ftype_long_pcvoid, + VSX_BUILTIN_LD_ELEMREV_V16QI); def_builtin ("__builtin_vsx_st_elemrev_v2df", void_ftype_v2df_long_pvoid, VSX_BUILTIN_ST_ELEMREV_V2DF); def_builtin ("__builtin_vsx_st_elemrev_v2di", void_ftype_v2di_long_pvoid, @@ -17626,43 +17534,11 @@ altivec_init_builtins (void) VSX_BUILTIN_ST_ELEMREV_V4SF); def_builtin ("__builtin_vsx_st_elemrev_v4si", void_ftype_v4si_long_pvoid, VSX_BUILTIN_ST_ELEMREV_V4SI); + def_builtin ("__builtin_vsx_st_elemrev_v8hi", void_ftype_v8hi_long_pvoid, + VSX_BUILTIN_ST_ELEMREV_V8HI); + def_builtin ("__builtin_vsx_st_elemrev_v16qi", void_ftype_v16qi_long_pvoid, + VSX_BUILTIN_ST_ELEMREV_V16QI); - def_builtin ("__builtin_vsx_le_be_v8hi", v8hi_ftype_long_pcvoid, - VSX_BUILTIN_XL_BE_V8HI); - def_builtin ("__builtin_vsx_le_be_v4si", v4si_ftype_long_pcvoid, - VSX_BUILTIN_XL_BE_V4SI); - def_builtin ("__builtin_vsx_le_be_v2di", v2di_ftype_long_pcvoid, - VSX_BUILTIN_XL_BE_V2DI); - def_builtin ("__builtin_vsx_le_be_v4sf", v4sf_ftype_long_pcvoid, - VSX_BUILTIN_XL_BE_V4SF); - def_builtin ("__builtin_vsx_le_be_v2df", v2df_ftype_long_pcvoid, - VSX_BUILTIN_XL_BE_V2DF); - def_builtin ("__builtin_vsx_le_be_v16qi", v16qi_ftype_long_pcvoid, - VSX_BUILTIN_XL_BE_V16QI); - - if (TARGET_P9_VECTOR) - { - def_builtin ("__builtin_vsx_ld_elemrev_v8hi", v8hi_ftype_long_pcvoid, - VSX_BUILTIN_LD_ELEMREV_V8HI); - def_builtin ("__builtin_vsx_ld_elemrev_v16qi", v16qi_ftype_long_pcvoid, - VSX_BUILTIN_LD_ELEMREV_V16QI); - def_builtin ("__builtin_vsx_st_elemrev_v8hi", - void_ftype_v8hi_long_pvoid, VSX_BUILTIN_ST_ELEMREV_V8HI); - def_builtin ("__builtin_vsx_st_elemrev_v16qi", - void_ftype_v16qi_long_pvoid, VSX_BUILTIN_ST_ELEMREV_V16QI); - } - else - { - rs6000_builtin_decls[(int) VSX_BUILTIN_LD_ELEMREV_V8HI] - = rs6000_builtin_decls[(int) VSX_BUILTIN_LXVW4X_V8HI]; - rs6000_builtin_decls[(int) VSX_BUILTIN_LD_ELEMREV_V16QI] - = rs6000_builtin_decls[(int) VSX_BUILTIN_LXVW4X_V16QI]; - rs6000_builtin_decls[(int) VSX_BUILTIN_ST_ELEMREV_V8HI] - = rs6000_builtin_decls[(int) VSX_BUILTIN_STXVW4X_V8HI]; - rs6000_builtin_decls[(int) VSX_BUILTIN_ST_ELEMREV_V16QI] - = rs6000_builtin_decls[(int) VSX_BUILTIN_STXVW4X_V16QI]; - } - def_builtin ("__builtin_vec_vsx_ld", opaque_ftype_long_pcvoid, VSX_BUILTIN_VEC_LD); def_builtin ("__builtin_vec_vsx_st", void_ftype_opaque_long_pvoid, @@ -17673,6 +17549,8 @@ altivec_init_builtins (void) VSX_BUILTIN_VEC_XL_BE); def_builtin ("__builtin_vec_xst", void_ftype_opaque_long_pvoid, VSX_BUILTIN_VEC_XST); + def_builtin ("__builtin_vec_xst_be", opaque_ftype_long_pcvoid, + VSX_BUILTIN_VEC_XST_BE); def_builtin ("__builtin_vec_step", int_ftype_opaque, ALTIVEC_BUILTIN_VEC_STEP); def_builtin ("__builtin_vec_splats", opaque_ftype_opaque, ALTIVEC_BUILTIN_VEC_SPLATS); Index: gcc/config/rs6000/vsx.md =================================================================== --- gcc/config/rs6000/vsx.md (revision 254629) +++ gcc/config/rs6000/vsx.md (working copy) @@ -1118,7 +1118,7 @@ "lxvw4x %x0,%y1" [(set_attr "type" "vecload")]) -(define_insn "vsx_ld_elemrev_v8hi" +(define_expand "vsx_ld_elemrev_v8hi" [(set (match_operand:V8HI 0 "vsx_register_operand" "=wa") (vec_select:V8HI (match_operand:V8HI 1 "memory_operand" "Z") @@ -1126,11 +1126,45 @@ (const_int 5) (const_int 4) (const_int 3) (const_int 2) (const_int 1) (const_int 0)])))] + "VECTOR_MEM_VSX_P (V8HImode) && !BYTES_BIG_ENDIAN" +{ + if (!TARGET_P9_VECTOR) + { + rtx tmp = gen_reg_rtx (V4SImode); + rtx subreg, subreg2, perm[16], pcv; + /* 2 is leftmost element in register */ + unsigned int reorder[16] = {13,12,15,14,9,8,11,10,5,4,7,6,1,0,3,2}; + int i; + + subreg = simplify_gen_subreg (V4SImode, operands[1], V8HImode, 0); + emit_insn (gen_vsx_ld_elemrev_v4si (tmp, subreg)); + subreg2 = simplify_gen_subreg (V8HImode, tmp, V4SImode, 0); + + for (i = 0; i < 16; ++i) + perm[i] = GEN_INT (reorder[i]); + + pcv = force_reg (V16QImode, + gen_rtx_CONST_VECTOR (V16QImode, + gen_rtvec_v (16, perm))); + emit_insn (gen_altivec_vperm_v8hi_direct (operands[0], subreg2, + subreg2, pcv)); + DONE; + } +}) + +(define_insn "*vsx_ld_elemrev_v8hi_internal" + [(set (match_operand:V8HI 0 "vsx_register_operand" "=wa") + (vec_select:V8HI + (match_operand:V8HI 1 "memory_operand" "Z") + (parallel [(const_int 7) (const_int 6) + (const_int 5) (const_int 4) + (const_int 3) (const_int 2) + (const_int 1) (const_int 0)])))] "VECTOR_MEM_VSX_P (V8HImode) && !BYTES_BIG_ENDIAN && TARGET_P9_VECTOR" "lxvh8x %x0,%y1" [(set_attr "type" "vecload")]) -(define_insn "vsx_ld_elemrev_v16qi" +(define_expand "vsx_ld_elemrev_v16qi" [(set (match_operand:V16QI 0 "vsx_register_operand" "=wa") (vec_select:V16QI (match_operand:V16QI 1 "memory_operand" "Z") @@ -1142,6 +1176,44 @@ (const_int 5) (const_int 4) (const_int 3) (const_int 2) (const_int 1) (const_int 0)])))] + "VECTOR_MEM_VSX_P (V16QImode) && !BYTES_BIG_ENDIAN" +{ + if (!TARGET_P9_VECTOR) + { + rtx tmp = gen_reg_rtx (V4SImode); + rtx subreg, subreg2, perm[16], pcv; + /* 3 is leftmost element in register */ + unsigned int reorder[16] = {12,13,14,15,8,9,10,11,4,5,6,7,0,1,2,3}; + int i; + + subreg = simplify_gen_subreg (V4SImode, operands[1], V16QImode, 0); + emit_insn (gen_vsx_ld_elemrev_v4si (tmp, subreg)); + subreg2 = simplify_gen_subreg (V16QImode, tmp, V4SImode, 0); + + for (i = 0; i < 16; ++i) + perm[i] = GEN_INT (reorder[i]); + + pcv = force_reg (V16QImode, + gen_rtx_CONST_VECTOR (V16QImode, + gen_rtvec_v (16, perm))); + emit_insn (gen_altivec_vperm_v16qi_direct (operands[0], subreg2, + subreg2, pcv)); + DONE; + } +}) + +(define_insn "*vsx_ld_elemrev_v16qi_internal" + [(set (match_operand:V16QI 0 "vsx_register_operand" "=wa") + (vec_select:V16QI + (match_operand:V16QI 1 "memory_operand" "Z") + (parallel [(const_int 15) (const_int 14) + (const_int 13) (const_int 12) + (const_int 11) (const_int 10) + (const_int 9) (const_int 8) + (const_int 7) (const_int 6) + (const_int 5) (const_int 4) + (const_int 3) (const_int 2) + (const_int 1) (const_int 0)])))] "VECTOR_MEM_VSX_P (V16QImode) && !BYTES_BIG_ENDIAN && TARGET_P9_VECTOR" "lxvb16x %x0,%y1" [(set_attr "type" "vecload")]) @@ -1184,7 +1256,7 @@ "stxvw4x %x1,%y0" [(set_attr "type" "vecstore")]) -(define_insn "vsx_st_elemrev_v8hi" +(define_expand "vsx_st_elemrev_v8hi" [(set (match_operand:V8HI 0 "memory_operand" "=Z") (vec_select:V8HI (match_operand:V8HI 1 "vsx_register_operand" "wa") @@ -1192,11 +1264,43 @@ (const_int 5) (const_int 4) (const_int 3) (const_int 2) (const_int 1) (const_int 0)])))] + "VECTOR_MEM_VSX_P (V8HImode) && !BYTES_BIG_ENDIAN" +{ + if (!TARGET_P9_VECTOR) + { + rtx subreg, perm[16], pcv; + rtx tmp = gen_reg_rtx (V8HImode); + /* 2 is leftmost element in register */ + unsigned int reorder[16] = {13,12,15,14,9,8,11,10,5,4,7,6,1,0,3,2}; + int i; + + for (i = 0; i < 16; ++i) + perm[i] = GEN_INT (reorder[i]); + + pcv = force_reg (V16QImode, + gen_rtx_CONST_VECTOR (V16QImode, + gen_rtvec_v (16, perm))); + emit_insn (gen_altivec_vperm_v8hi_direct (tmp, operands[1], + operands[1], pcv)); + subreg = simplify_gen_subreg (V4SImode, tmp, V8HImode, 0); + emit_insn (gen_vsx_st_elemrev_v4si (subreg, operands[0])); + DONE; + } +}) + +(define_insn "*vsx_st_elemrev_v8hi_internal" + [(set (match_operand:V8HI 0 "memory_operand" "=Z") + (vec_select:V8HI + (match_operand:V8HI 1 "vsx_register_operand" "wa") + (parallel [(const_int 7) (const_int 6) + (const_int 5) (const_int 4) + (const_int 3) (const_int 2) + (const_int 1) (const_int 0)])))] "VECTOR_MEM_VSX_P (V8HImode) && !BYTES_BIG_ENDIAN && TARGET_P9_VECTOR" "stxvh8x %x1,%y0" [(set_attr "type" "vecstore")]) -(define_insn "vsx_st_elemrev_v16qi" +(define_expand "vsx_st_elemrev_v16qi" [(set (match_operand:V16QI 0 "memory_operand" "=Z") (vec_select:V16QI (match_operand:V16QI 1 "vsx_register_operand" "wa") @@ -1208,6 +1312,42 @@ (const_int 5) (const_int 4) (const_int 3) (const_int 2) (const_int 1) (const_int 0)])))] + "VECTOR_MEM_VSX_P (V16QImode) && !BYTES_BIG_ENDIAN" +{ + if (!TARGET_P9_VECTOR) + { + rtx subreg, perm[16], pcv; + rtx tmp = gen_reg_rtx (V16QImode); + /* 3 is leftmost element in register */ + unsigned int reorder[16] = {12,13,14,15,8,9,10,11,4,5,6,7,0,1,2,3}; + int i; + + for (i = 0; i < 16; ++i) + perm[i] = GEN_INT (reorder[i]); + + pcv = force_reg (V16QImode, + gen_rtx_CONST_VECTOR (V16QImode, + gen_rtvec_v (16, perm))); + emit_insn (gen_altivec_vperm_v16qi_direct (tmp, operands[1], + operands[1], pcv)); + subreg = simplify_gen_subreg (V4SImode, tmp, V16QImode, 0); + emit_insn (gen_vsx_st_elemrev_v4si (subreg, operands[0])); + DONE; + } +}) + +(define_insn "*vsx_st_elemrev_v16qi_internal" + [(set (match_operand:V16QI 0 "memory_operand" "=Z") + (vec_select:V16QI + (match_operand:V16QI 1 "vsx_register_operand" "wa") + (parallel [(const_int 15) (const_int 14) + (const_int 13) (const_int 12) + (const_int 11) (const_int 10) + (const_int 9) (const_int 8) + (const_int 7) (const_int 6) + (const_int 5) (const_int 4) + (const_int 3) (const_int 2) + (const_int 1) (const_int 0)])))] "VECTOR_MEM_VSX_P (V16QImode) && !BYTES_BIG_ENDIAN && TARGET_P9_VECTOR" "stxvb16x %x1,%y0" [(set_attr "type" "vecstore")]) Index: gcc/testsuite/gcc.target/powerpc/swaps-p8-26.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/swaps-p8-26.c (revision 254629) +++ gcc/testsuite/gcc.target/powerpc/swaps-p8-26.c (working copy) @@ -1,11 +1,11 @@ /* { dg-do compile { target { powerpc64le-*-* } } } */ /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */ /* { dg-options "-mcpu=power8 -O3 " } */ -/* { dg-final { scan-assembler-times "lxvw4x" 2 } } */ -/* { dg-final { scan-assembler "stxvw4x" } } */ +/* { dg-final { scan-assembler-times "lxvd2x" 2 } } */ +/* { dg-final { scan-assembler "stxvd2x" } } */ /* { dg-final { scan-assembler-not "xxpermdi" } } */ -/* Verify that swap optimization does not interfere with element-reversing +/* Verify that swap optimization does not interfere with unaligned loads and stores. */ /* Test case to resolve PR79044. */