From patchwork Mon Oct 31 22:28:42 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bill Schmidt X-Patchwork-Id: 689661 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3t78D96PSwz9s3v for ; Tue, 1 Nov 2016 09:29:09 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=xF/NWrRb; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :subject:to:cc:date:mime-version:content-type :content-transfer-encoding:message-id; q=dns; s=default; b=J/Tgx GM77sNqItOm94K8HWphzovzHX+88aPXk5vF1Me2AzMOHO/44eJNyDL6VZQNnhQME qve4rvbXypuhIF9q1ZTLwCcFkvTxUVySqXrSAYF+7fcNSy61PwOfPhv/69wRKks6 gPOwBrpmpA8r015yG9VPoY/RWdSsW1xFUnqZD8= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :subject:to:cc:date:mime-version:content-type :content-transfer-encoding:message-id; s=default; bh=C22X2pdqtnb 12cp10tG+qiNAB94=; b=xF/NWrRb53zW4nx2reKvmiDorPkr599TsVT6xdJbRqb R3GDjxlbT/+dwbcIQReLzAuHyUBpjTlnHzq10hzQNh1ARK07lyp9r8eIaEVINDwu qnt2/8PQt54jXEJnYlXJw2N9ZIJaCGZPCm9Q+XPSAbo4w50IEMOea6/tjdv7Wzxk = Received: (qmail 50683 invoked by alias); 31 Oct 2016 22:29:01 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 50657 invoked by uid 89); 31 Oct 2016 22:29:00 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.8 required=5.0 tests=AWL, BAYES_00, KAM_ASCII_DIVIDERS, KAM_LAZY_DOMAIN_SECURITY, RCVD_IN_DNSWL_LOW autolearn=no version=3.3.2 spammy=granted, Exception, ease, regret X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0b-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.158.5) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 31 Oct 2016 22:28:49 +0000 Received: from pps.filterd (m0098414.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.17/8.16.0.17) with SMTP id u9VMSbZP108970 for ; Mon, 31 Oct 2016 18:28:48 -0400 Received: from e19.ny.us.ibm.com (e19.ny.us.ibm.com [129.33.205.209]) by mx0b-001b2d01.pphosted.com with ESMTP id 26ebsttc28-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Mon, 31 Oct 2016 18:28:47 -0400 Received: from localhost by e19.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 31 Oct 2016 18:28:47 -0400 Received: from d01dlp02.pok.ibm.com (9.56.250.167) by e19.ny.us.ibm.com (146.89.104.206) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Mon, 31 Oct 2016 18:28:44 -0400 Received: from b01cxnp22036.gho.pok.ibm.com (b01cxnp22036.gho.pok.ibm.com [9.57.198.26]) by d01dlp02.pok.ibm.com (Postfix) with ESMTP id 682C66E801D; Mon, 31 Oct 2016 18:28:19 -0400 (EDT) Received: from b01ledav005.gho.pok.ibm.com (b01ledav005.gho.pok.ibm.com [9.57.199.110]) by b01cxnp22036.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id u9VMSiNQ24969354; Mon, 31 Oct 2016 22:28:44 GMT Received: from b01ledav005.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E0ACCAE054; Mon, 31 Oct 2016 18:28:43 -0400 (EDT) Received: from BigMac.local (unknown [9.85.138.244]) by b01ledav005.gho.pok.ibm.com (Postfix) with ESMTP id 6B3F4AE03B; Mon, 31 Oct 2016 18:28:43 -0400 (EDT) From: Bill Schmidt Subject: [PATCH, RFC, rs6000] Add overloaded built-in function support to altivec.h, and re-implement vec_add To: GCC Patches Cc: Segher Boessenkool , David Edelsohn , willschm@vnet.ibm.com, Michael Meissner Date: Mon, 31 Oct 2016 17:28:42 -0500 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16103122-0056-0000-0000-000001C9D4F1 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00006010; HX=3.00000240; KW=3.00000007; PH=3.00000004; SC=3.00000189; SDB=6.00775158; UDB=6.00372608; IPR=6.00552152; BA=6.00004845; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00013164; XFM=3.00000011; UTC=2016-10-31 22:28:46 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16103122-0057-0000-0000-000005FCEE66 Message-Id: <4fb8f7f2-ff17-6416-3869-a8576c245dde@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2016-10-31_09:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=2 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1609300000 definitions=main-1610310394 X-IsSubscribed: yes Hi, The PowerPC back end loses performance on vector intrinsics, because currently all of them are treated as calls throughout the middle-end phases and only expanded when they reach RTL. Our version of altivec.h currently defines the public names of overloaded functions (like vec_add) to be #defines for hidden functions (like __builtin_vec_add), which are recognized in the parser as requiring special back-end support. Tables in rs6000-c.c handle dispatch of the overloaded functions to specific function calls appropriate to the argument types. The Clang version of altivec.h, by contrast, creates static inlines for each overloaded function variant, relying on a special __attribute__((overloadable)) construct to do the dispatch in the parser itself. This allows vec_add to be immediately translated into type-specific addition during parsing, allowing the expressions to be subject to all subsequent optimization. We have opened a PR suggesting that this attribute be supported in GCC as well (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71199), but so far there hasn't been any success in that regard. While waiting/hoping for the attribute to be implemented, though, we can use existing mechanisms to create a poor man's version of overloading dispatch. This patch is a proof of concept for how this can be done, and provides support for early expansion of the overloaded vec_add intrinsic. If we get this working, then we can gradually add more intrinsics over time. The dispatch mechanism is provided in a new header file, overload.h, which is included in altivec.h. This is done because the guts of the dispatch mechanism are pretty ugly to look at. Overloading is done with a chain of calls to __builtin_choose_expr and __builtin_types_compatible_p. Currently I envision providing a separate dispatch macro for each combination of the number of arguments and the number of variants to be distinguished. I also provide a separate "decl" macro for each number of arguments, used to create the function decls for each static inline function. The add_vec intrinsic takes two input arguments and has 28 variants, so it requires the definition of OVERLOAD_2ARG_28VAR and OVERLOAD_2ARG_DECL in overload.h. These macros are then instantiated in altivec.h. The dispatch macro for an overloaded intrinsic is instantiated once, and the decl macro is instantiated once for each variant, along with the associated inline function body. The dispatch macro may need to vary depending on the supported processor features. In the vec_add example, we have some variants that support the "vector double" and "vector long long" data types. These only exist when VSX code generation is supported, so a dispatch table conditioned on __VSX__ includes these, while a separate one without VSX support does not. Similarly, __POWER8_VECTOR__ must be defined if we are to support "vector signed/unsigned __int128". Because we use a numbering scheme that needs to be kept consistent, this requires three versions of the dispatch table, where the more restrictive versions replace the unimplemented entries with redundant entries. Note that if and when we get an overloadable attribute in GCC, the stuff in overload.h will become obsolete, we will remove the dispatch instantiations, and we will replace the decl instantiations with plain decls with the overloadable attribute applied. There are several complications on top of the basic design: * When compiling for C++, the dispatch mechanism is not available, and indeed is not necessary. Thus for C++ we skip the dispatch mechanism, and change the definition of OVERLOAD_2ARG_DECL to use standard function overloading. * Compiling with -ansi or -std=c11 or the like means the dispatch mechanism is unavailable even for C, since GNU extensions are disallowed. Regret- tably, this means that we can't get rid of the existing late-expansion methods altogether. I don't see any way to avoid this. Note that this would be the case even if we had __attribute__ ((overloadable)), since that would also be a GNU extension. Despite the mess, I think that the performance improvements for non-strict-ANSI code make the dual maintenance worthwhile. * "#pragma GCC target" is going to cause a lot of trouble. With the patch in its present state, we fail gcc.target/powerpc/ppc-target-4.c, which tests the use of "vsx", "altivec,no-vsx", and "no-altivec" target options, and happens to use vec_add (float, float) as a testbed. The test fails because altivec.h is #included after the #pragma GCC target("vsx"), which allows the interfaces involving vector long long and vector double to be produced. However, when the options are changed to "altivec,no-vsx", the subsequent invocation of vec_add expands to a dispatch sequence including vector long long, leading to a compile-time error. I can only think of two ways to deal with this, neither of which is attractive. The first idea would be to make altivec.h capable of being inlined more than once. This essentially requires an #undef before each #define. Once this is done, usage of #pragma GCC target would be supported provided that altivec.h is re-included after each such #pragma, so that the dispatch macros would be re-evaluated in the new context. The problem with this is that existing code not conforming to this requirement would fail to compile, so this is probably off the table. The other way would be to require a specific option on the command line to use the new dispatch mechanism. When the option is present, we would predefine a macro such as __PPC_FAST_VECTOR__, which would then gate the usage in altivec.h and overload.h. Use of #pragma GCC target to change the availability of Altivec, VMX, P8-vector, etc. would also be disallowed when the option is present. This has the advantage of always generating correct code, at the cost of requiring a special option before anyone can leverage the benefits of early vector expansion. That's unfortunate, but I suspect it's the best we can do. The current patch is nearly complete, but the #pragma GCC target issue is not yet resolved. I'd like to get opinions on the overall approach of the patch and whether you agree with my assessment of the #pragma issue before taking the patch forward. Thanks for reading this far, and thanks in advance for your opinions. We can get some big performance improvements here eventually, but the road is a bit rocky. Thanks, Bill [gcc] 2016-10-31 Bill Schmidt * config/rs6000/altivec.h: Add new include of overload.h; when not compiling for C++ or strict ANSI, add new #defines for vec_add in terms of OVERLOAD_2ARG_28VAR and OVERLOAD_2ARG_DECL macros; when compiling for C++ but not for strict ANSI, use just the OVERLOAD_2ARG_DECL macros; when not compiling for strict ANSI, remove #define of vec_add in terms of __builtin_vec_add. * config/rs6000/overload.h: New file, with #defines of OVERLOAD_2ARG_28VAR when not compiling for C++ or strict ANSI, and two different flavors of OVERLOAD_2ARG_DECL (C++ and otherwise) when not compiling for strict ANSI. * config.gcc: For each triple that includes altivec.h in extra_headers, also add overload.h. [gcc/testsuite] 2016-10-31 Bill Schmidt * gcc.target/powerpc/overload-add-1.c: New. * gcc.target/powerpc/overload-add-2.c: New. * gcc.target/powerpc/overload-add-3.c: New. * gcc.target/powerpc/overload-add-4.c: New. * gcc.target/powerpc/overload-add-5.c: New. * gcc.target/powerpc/overload-add-6.c: New. * gcc.target/powerpc/overload-add-7.c: New. Index: gcc/config/rs6000/altivec.h =================================================================== --- gcc/config/rs6000/altivec.h (revision 241624) +++ gcc/config/rs6000/altivec.h (working copy) @@ -53,6 +53,353 @@ #define __CR6_LT 2 #define __CR6_LT_REV 3 +/* Machinery to support overloaded functions in C. */ +#include "overload.h" + +/* Overloaded function declarations. Please maintain these in + alphabetical order. */ + +/* Since __builtin_choose_expr and __builtin_types_compatible_p + aren't permitted in C++, we'll need to use standard overloading + for those. Disable this mechanism for C++. GNU extensions are + also unavailable for -ansi, -std=c11, etc. */ +#ifndef __STRICT_ANSI__ +#ifndef __cplusplus + +#ifdef __POWER8_VECTOR__ +#define vec_add(a1, a2) \ + OVERLOAD_2ARG_28VAR(vec_add, a1, a2, \ + 1, vector bool char, vector signed char, \ + 2, vector signed char, vector bool char, \ + 3, vector signed char, vector signed char, \ + 4, vector bool char, vector unsigned char, \ + 5, vector unsigned char, vector bool char, \ + 6, vector unsigned char, vector unsigned char, \ + 7, vector bool short, vector signed short, \ + 8, vector signed short, vector bool short, \ + 9, vector signed short, vector signed short, \ + 10, vector bool short, vector unsigned short, \ + 11, vector unsigned short, vector bool short, \ + 12, vector unsigned short, vector unsigned short, \ + 13, vector bool int, vector signed int, \ + 14, vector signed int, vector bool int, \ + 15, vector signed int, vector signed int, \ + 16, vector bool int, vector unsigned int, \ + 17, vector unsigned int, vector bool int, \ + 18, vector unsigned int, vector unsigned int, \ + 19, vector bool long long, vector signed long long, \ + 20, vector signed long long, vector bool long long, \ + 21, vector signed long long, vector signed long long, \ + 22, vector bool long long, vector unsigned long long, \ + 23, vector unsigned long long, vector bool long long, \ + 24, vector unsigned long long, vector unsigned long long, \ + 25, vector float, vector float, \ + 26, vector double, vector double, \ + 27, vector signed __int128, vector signed __int128, \ + 28, vector unsigned __int128, vector unsigned __int128) +#elif defined __VSX__ +#define vec_add(a1, a2) \ + OVERLOAD_2ARG_28VAR(vec_add, a1, a2, \ + 1, vector bool char, vector signed char, \ + 2, vector signed char, vector bool char, \ + 3, vector signed char, vector signed char, \ + 4, vector bool char, vector unsigned char, \ + 5, vector unsigned char, vector bool char, \ + 6, vector unsigned char, vector unsigned char, \ + 7, vector bool short, vector signed short, \ + 8, vector signed short, vector bool short, \ + 9, vector signed short, vector signed short, \ + 10, vector bool short, vector unsigned short, \ + 11, vector unsigned short, vector bool short, \ + 12, vector unsigned short, vector unsigned short, \ + 13, vector bool int, vector signed int, \ + 14, vector signed int, vector bool int, \ + 15, vector signed int, vector signed int, \ + 16, vector bool int, vector unsigned int, \ + 17, vector unsigned int, vector bool int, \ + 18, vector unsigned int, vector unsigned int, \ + 19, vector bool long long, vector signed long long, \ + 20, vector signed long long, vector bool long long, \ + 21, vector signed long long, vector signed long long, \ + 22, vector bool long long, vector unsigned long long, \ + 23, vector unsigned long long, vector bool long long, \ + 24, vector unsigned long long, vector unsigned long long, \ + 25, vector float, vector float, \ + 26, vector double, vector double, \ + 26, vector double, vector double, \ + 26, vector double, vector double) +#else +#define vec_add(a1, a2) \ + OVERLOAD_2ARG_28VAR(vec_add, a1, a2, \ + 1, vector bool char, vector signed char, \ + 2, vector signed char, vector bool char, \ + 3, vector signed char, vector signed char, \ + 4, vector bool char, vector unsigned char, \ + 5, vector unsigned char, vector bool char, \ + 6, vector unsigned char, vector unsigned char, \ + 7, vector bool short, vector signed short, \ + 8, vector signed short, vector bool short, \ + 9, vector signed short, vector signed short, \ + 10, vector bool short, vector unsigned short, \ + 11, vector unsigned short, vector bool short, \ + 12, vector unsigned short, vector unsigned short, \ + 13, vector bool int, vector signed int, \ + 14, vector signed int, vector bool int, \ + 15, vector signed int, vector signed int, \ + 16, vector bool int, vector unsigned int, \ + 17, vector unsigned int, vector bool int, \ + 18, vector unsigned int, vector unsigned int, \ + 18, vector unsigned int, vector unsigned int, \ + 18, vector unsigned int, vector unsigned int, \ + 18, vector unsigned int, vector unsigned int, \ + 18, vector unsigned int, vector unsigned int, \ + 18, vector unsigned int, vector unsigned int, \ + 18, vector unsigned int, vector unsigned int, \ + 25, vector float, vector float, \ + 25, vector float, vector float, \ + 25, vector float, vector float, \ + 25, vector float, vector float) +#endif /* __POWER8_VECTOR__ #elif __VSX__ */ + +#endif /* !__cplusplus */ + +OVERLOAD_2ARG_DECL(vec_add, 1, \ + vector signed char, \ + vector bool char, a1, \ + vector signed char, a2) +{ + return (vector signed char)a1 + a2; +} + +OVERLOAD_2ARG_DECL(vec_add, 2, \ + vector signed char, \ + vector signed char, a1, \ + vector bool char, a2) +{ + return a1 + (vector signed char)a2; +} + +OVERLOAD_2ARG_DECL(vec_add, 3, \ + vector signed char, \ + vector signed char, a1, \ + vector signed char, a2) +{ + return a1 + a2; +} + +OVERLOAD_2ARG_DECL(vec_add, 4, \ + vector unsigned char, \ + vector bool char, a1, \ + vector unsigned char, a2) +{ + return (vector unsigned char)a1 + a2; +} + +OVERLOAD_2ARG_DECL(vec_add, 5, \ + vector unsigned char, \ + vector unsigned char, a1, \ + vector bool char, a2) +{ + return a1 + (vector unsigned char)a2; +} + +OVERLOAD_2ARG_DECL(vec_add, 6, \ + vector unsigned char, \ + vector unsigned char, a1, \ + vector unsigned char, a2) +{ + return a1 + a2; +} + +OVERLOAD_2ARG_DECL(vec_add, 7, \ + vector signed short, \ + vector bool short, a1, \ + vector signed short, a2) +{ + return (vector signed short)a1 + a2; +} + +OVERLOAD_2ARG_DECL(vec_add, 8, \ + vector signed short, \ + vector signed short, a1, \ + vector bool short, a2) +{ + return a1 + (vector signed short)a2; +} + +OVERLOAD_2ARG_DECL(vec_add, 9, \ + vector signed short, \ + vector signed short, a1, \ + vector signed short, a2) +{ + return a1 + a2; +} + +OVERLOAD_2ARG_DECL(vec_add, 10, \ + vector unsigned short, \ + vector bool short, a1, \ + vector unsigned short, a2) +{ + return (vector unsigned short)a1 + a2; +} + +OVERLOAD_2ARG_DECL(vec_add, 11, \ + vector unsigned short, \ + vector unsigned short, a1, \ + vector bool short, a2) +{ + return a1 + (vector unsigned short)a2; +} + +OVERLOAD_2ARG_DECL(vec_add, 12, \ + vector unsigned short, \ + vector unsigned short, a1, \ + vector unsigned short, a2) +{ + return a1 + a2; +} + +OVERLOAD_2ARG_DECL(vec_add, 13, \ + vector signed int, \ + vector bool int, a1, \ + vector signed int, a2) +{ + return (vector signed int)a1 + a2; +} + +OVERLOAD_2ARG_DECL(vec_add, 14, \ + vector signed int, \ + vector signed int, a1, \ + vector bool int, a2) +{ + return a1 + (vector signed int)a2; +} + +OVERLOAD_2ARG_DECL(vec_add, 15, \ + vector signed int, \ + vector signed int, a1, \ + vector signed int, a2) +{ + return a1 + a2; +} + +OVERLOAD_2ARG_DECL(vec_add, 16, \ + vector unsigned int, \ + vector bool int, a1, \ + vector unsigned int, a2) +{ + return (vector unsigned int)a1 + a2; +} + +OVERLOAD_2ARG_DECL(vec_add, 17, \ + vector unsigned int, \ + vector unsigned int, a1, \ + vector bool int, a2) +{ + return a1 + (vector unsigned int)a2; +} + +OVERLOAD_2ARG_DECL(vec_add, 18, \ + vector unsigned int, \ + vector unsigned int, a1, \ + vector unsigned int, a2) +{ + return a1 + a2; +} + +#ifdef __VSX__ +OVERLOAD_2ARG_DECL(vec_add, 19, \ + vector signed long long, \ + vector bool long long, a1, \ + vector signed long long, a2) +{ + return (vector signed long long)a1 + a2; +} + +OVERLOAD_2ARG_DECL(vec_add, 20, \ + vector signed long long, \ + vector signed long long, a1, \ + vector bool long long, a2) +{ + return a1 + (vector signed long long)a2; +} + +OVERLOAD_2ARG_DECL(vec_add, 21, \ + vector signed long long, \ + vector signed long long, a1, \ + vector signed long long, a2) +{ + return a1 + a2; +} + +OVERLOAD_2ARG_DECL(vec_add, 22, \ + vector unsigned long long, \ + vector bool long long, a1, \ + vector unsigned long long, a2) +{ + return (vector unsigned long long)a1 + a2; +} + +OVERLOAD_2ARG_DECL(vec_add, 23, \ + vector unsigned long long, \ + vector unsigned long long, a1, \ + vector bool long long, a2) +{ + return a1 + (vector unsigned long long)a2; +} + +OVERLOAD_2ARG_DECL(vec_add, 24, \ + vector unsigned long long, \ + vector unsigned long long, a1, \ + vector unsigned long long, a2) +{ + return a1 + a2; +} +#endif /* __VSX__ */ + +OVERLOAD_2ARG_DECL(vec_add, 25, \ + vector float, \ + vector float, a1, \ + vector float, a2) +{ + return a1 + a2; +} + +#ifdef __VSX__ +OVERLOAD_2ARG_DECL(vec_add, 26, \ + vector double, \ + vector double, a1, \ + vector double, a2) +{ + return a1 + a2; +} +#endif /* __VSX__ */ + +/* Currently we do not early-expand vec_add for vector __int128. This + is because vector lowering in the middle end casts V1TImode to TImode, + which is probably appropriate since we have very little support for + V1TImode arithmetic. Late expansion ensures we get the single + instruction add. */ +#ifdef __POWER8_VECTOR__ +OVERLOAD_2ARG_DECL(vec_add, 27, \ + vector signed __int128, \ + vector signed __int128, a1, \ + vector signed __int128, a2) +{ + return __builtin_vec_add (a1, a2); +} + +OVERLOAD_2ARG_DECL(vec_add, 28, \ + vector unsigned __int128, \ + vector unsigned __int128, a1, \ + vector unsigned __int128, a2) +{ + return __builtin_vec_add (a1, a2); +} +#endif /* __POWER8_VECTOR__ */ + +#endif /* !__STRICT_ANSI__ */ + /* Synonyms. */ #define vec_vaddcuw vec_addc #define vec_vand vec_and @@ -190,7 +537,9 @@ #define vec_vupklsb __builtin_vec_vupklsb #define vec_abs __builtin_vec_abs #define vec_abss __builtin_vec_abss +#ifdef __STRICT_ANSI__ #define vec_add __builtin_vec_add +#endif #define vec_adds __builtin_vec_adds #define vec_and __builtin_vec_and #define vec_andc __builtin_vec_andc Index: gcc/config/rs6000/overload.h =================================================================== --- gcc/config/rs6000/overload.h (revision 0) +++ gcc/config/rs6000/overload.h (working copy) @@ -0,0 +1,206 @@ +/* Overloaded Built-In Function Support + Copyright (C) 2016 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published + by the Free Software Foundation; either version 3, or (at your + option) any later version. + + GCC is distributed in the hope that it will be useful, but WITHOUT + ANY WARRANTY; without even the implied warranty of MERCHANTABILITY + or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public + License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#ifndef _OVERLOAD_H +#define _OVERLOAD_H 1 + +/* Since __builtin_choose_expr and __builtin_types_compatible_p + aren't permitted in C++, we'll need to use standard overloading + for those. Disable this mechanism for C++. GNU extensions are + also unavailable for -ansi, -std=c11, etc. */ +#if !defined __cplusplus && !defined __STRICT_ANSI__ + +/* Macros named OVERLOAD_ARG_VAR provide a dispatch mechanism + for built-in functions taking N input arguments and M overloaded + variants. Note that indentation conventions for nested calls to + __builtin_choose_expr are violated for practicality. Please + maintain these macros in increasing order by N and M for ease + of reuse. */ + +#define OVERLOAD_2ARG_28VAR(NAME, ARG1, ARG2, \ + VAR1_ID, VAR1_TYPE1, VAR1_TYPE2, \ + VAR2_ID, VAR2_TYPE1, VAR2_TYPE2, \ + VAR3_ID, VAR3_TYPE1, VAR3_TYPE2, \ + VAR4_ID, VAR4_TYPE1, VAR4_TYPE2, \ + VAR5_ID, VAR5_TYPE1, VAR5_TYPE2, \ + VAR6_ID, VAR6_TYPE1, VAR6_TYPE2, \ + VAR7_ID, VAR7_TYPE1, VAR7_TYPE2, \ + VAR8_ID, VAR8_TYPE1, VAR8_TYPE2, \ + VAR9_ID, VAR9_TYPE1, VAR9_TYPE2, \ + VAR10_ID, VAR10_TYPE1, VAR10_TYPE2, \ + VAR11_ID, VAR11_TYPE1, VAR11_TYPE2, \ + VAR12_ID, VAR12_TYPE1, VAR12_TYPE2, \ + VAR13_ID, VAR13_TYPE1, VAR13_TYPE2, \ + VAR14_ID, VAR14_TYPE1, VAR14_TYPE2, \ + VAR15_ID, VAR15_TYPE1, VAR15_TYPE2, \ + VAR16_ID, VAR16_TYPE1, VAR16_TYPE2, \ + VAR17_ID, VAR17_TYPE1, VAR17_TYPE2, \ + VAR18_ID, VAR18_TYPE1, VAR18_TYPE2, \ + VAR19_ID, VAR19_TYPE1, VAR19_TYPE2, \ + VAR20_ID, VAR20_TYPE1, VAR20_TYPE2, \ + VAR21_ID, VAR21_TYPE1, VAR21_TYPE2, \ + VAR22_ID, VAR22_TYPE1, VAR22_TYPE2, \ + VAR23_ID, VAR23_TYPE1, VAR23_TYPE2, \ + VAR24_ID, VAR24_TYPE1, VAR24_TYPE2, \ + VAR25_ID, VAR25_TYPE1, VAR25_TYPE2, \ + VAR26_ID, VAR26_TYPE1, VAR26_TYPE2, \ + VAR27_ID, VAR27_TYPE1, VAR27_TYPE2, \ + VAR28_ID, VAR28_TYPE1, VAR28_TYPE2) \ + __builtin_choose_expr ( \ + __builtin_types_compatible_p (__typeof__ (ARG1), VAR1_TYPE1) \ + && __builtin_types_compatible_p (__typeof__ (ARG2), VAR1_TYPE2), \ + _##NAME##_##VAR1_ID ((VAR1_TYPE1)ARG1, (VAR1_TYPE2)ARG2), \ + __builtin_choose_expr ( \ + __builtin_types_compatible_p (__typeof__ (ARG1), VAR2_TYPE1) \ + && __builtin_types_compatible_p (__typeof__ (ARG2), VAR2_TYPE2), \ + _##NAME##_##VAR2_ID ((VAR2_TYPE1)ARG1, (VAR2_TYPE2)ARG2), \ + __builtin_choose_expr ( \ + __builtin_types_compatible_p (__typeof__ (ARG1), VAR3_TYPE1) \ + && __builtin_types_compatible_p (__typeof__ (ARG2), VAR3_TYPE2), \ + _##NAME##_##VAR3_ID ((VAR3_TYPE1)ARG1, (VAR3_TYPE2)ARG2), \ + __builtin_choose_expr ( \ + __builtin_types_compatible_p (__typeof__ (ARG1), VAR4_TYPE1) \ + && __builtin_types_compatible_p (__typeof__ (ARG2), VAR4_TYPE2), \ + _##NAME##_##VAR4_ID ((VAR4_TYPE1)ARG1, (VAR4_TYPE2)ARG2), \ + __builtin_choose_expr ( \ + __builtin_types_compatible_p (__typeof__ (ARG1), VAR5_TYPE1) \ + && __builtin_types_compatible_p (__typeof__ (ARG2), VAR5_TYPE2), \ + _##NAME##_##VAR5_ID ((VAR5_TYPE1)ARG1, (VAR5_TYPE2)ARG2), \ + __builtin_choose_expr ( \ + __builtin_types_compatible_p (__typeof__ (ARG1), VAR6_TYPE1) \ + && __builtin_types_compatible_p (__typeof__ (ARG2), VAR6_TYPE2), \ + _##NAME##_##VAR6_ID ((VAR6_TYPE1)ARG1, (VAR6_TYPE2)ARG2), \ + __builtin_choose_expr ( \ + __builtin_types_compatible_p (__typeof__ (ARG1), VAR7_TYPE1) \ + && __builtin_types_compatible_p (__typeof__ (ARG2), VAR7_TYPE2), \ + _##NAME##_##VAR7_ID ((VAR7_TYPE1)ARG1, (VAR7_TYPE2)ARG2), \ + __builtin_choose_expr ( \ + __builtin_types_compatible_p (__typeof__ (ARG1), VAR8_TYPE1) \ + && __builtin_types_compatible_p (__typeof__ (ARG2), VAR8_TYPE2), \ + _##NAME##_##VAR8_ID ((VAR8_TYPE1)ARG1, (VAR8_TYPE2)ARG2), \ + __builtin_choose_expr ( \ + __builtin_types_compatible_p (__typeof__ (ARG1), VAR9_TYPE1) \ + && __builtin_types_compatible_p (__typeof__ (ARG2), VAR9_TYPE2), \ + _##NAME##_##VAR9_ID ((VAR9_TYPE1)ARG1, (VAR9_TYPE2)ARG2), \ + __builtin_choose_expr ( \ + __builtin_types_compatible_p (__typeof__ (ARG1), VAR10_TYPE1) \ + && __builtin_types_compatible_p (__typeof__ (ARG2), VAR10_TYPE2), \ + _##NAME##_##VAR10_ID ((VAR10_TYPE1)ARG1, (VAR10_TYPE2)ARG2), \ + __builtin_choose_expr ( \ + __builtin_types_compatible_p (__typeof__ (ARG1), VAR11_TYPE1) \ + && __builtin_types_compatible_p (__typeof__ (ARG2), VAR11_TYPE2), \ + _##NAME##_##VAR11_ID ((VAR11_TYPE1)ARG1, (VAR11_TYPE2)ARG2), \ + __builtin_choose_expr ( \ + __builtin_types_compatible_p (__typeof__ (ARG1), VAR12_TYPE1) \ + && __builtin_types_compatible_p (__typeof__ (ARG2), VAR12_TYPE2), \ + _##NAME##_##VAR12_ID ((VAR12_TYPE1)ARG1, (VAR12_TYPE2)ARG2), \ + __builtin_choose_expr ( \ + __builtin_types_compatible_p (__typeof__ (ARG1), VAR13_TYPE1) \ + && __builtin_types_compatible_p (__typeof__ (ARG2), VAR13_TYPE2), \ + _##NAME##_##VAR13_ID ((VAR13_TYPE1)ARG1, (VAR13_TYPE2)ARG2), \ + __builtin_choose_expr ( \ + __builtin_types_compatible_p (__typeof__ (ARG1), VAR14_TYPE1) \ + && __builtin_types_compatible_p (__typeof__ (ARG2), VAR14_TYPE2), \ + _##NAME##_##VAR14_ID ((VAR14_TYPE1)ARG1, (VAR14_TYPE2)ARG2), \ + __builtin_choose_expr ( \ + __builtin_types_compatible_p (__typeof__ (ARG1), VAR15_TYPE1) \ + && __builtin_types_compatible_p (__typeof__ (ARG2), VAR15_TYPE2), \ + _##NAME##_##VAR15_ID ((VAR15_TYPE1)ARG1, (VAR15_TYPE2)ARG2), \ + __builtin_choose_expr ( \ + __builtin_types_compatible_p (__typeof__ (ARG1), VAR16_TYPE1) \ + && __builtin_types_compatible_p (__typeof__ (ARG2), VAR16_TYPE2), \ + _##NAME##_##VAR16_ID ((VAR16_TYPE1)ARG1, (VAR16_TYPE2)ARG2), \ + __builtin_choose_expr ( \ + __builtin_types_compatible_p (__typeof__ (ARG1), VAR17_TYPE1) \ + && __builtin_types_compatible_p (__typeof__ (ARG2), VAR17_TYPE2), \ + _##NAME##_##VAR17_ID ((VAR17_TYPE1)ARG1, (VAR17_TYPE2)ARG2), \ + __builtin_choose_expr ( \ + __builtin_types_compatible_p (__typeof__ (ARG1), VAR18_TYPE1) \ + && __builtin_types_compatible_p (__typeof__ (ARG2), VAR18_TYPE2), \ + _##NAME##_##VAR18_ID ((VAR18_TYPE1)ARG1, (VAR18_TYPE2)ARG2), \ + __builtin_choose_expr ( \ + __builtin_types_compatible_p (__typeof__ (ARG1), VAR19_TYPE1) \ + && __builtin_types_compatible_p (__typeof__ (ARG2), VAR19_TYPE2), \ + _##NAME##_##VAR19_ID ((VAR19_TYPE1)ARG1, (VAR19_TYPE2)ARG2), \ + __builtin_choose_expr ( \ + __builtin_types_compatible_p (__typeof__ (ARG1), VAR20_TYPE1) \ + && __builtin_types_compatible_p (__typeof__ (ARG2), VAR20_TYPE2), \ + _##NAME##_##VAR20_ID ((VAR20_TYPE1)ARG1, (VAR20_TYPE2)ARG2), \ + __builtin_choose_expr ( \ + __builtin_types_compatible_p (__typeof__ (ARG1), VAR21_TYPE1) \ + && __builtin_types_compatible_p (__typeof__ (ARG2), VAR21_TYPE2), \ + _##NAME##_##VAR21_ID ((VAR21_TYPE1)ARG1, (VAR21_TYPE2)ARG2), \ + __builtin_choose_expr ( \ + __builtin_types_compatible_p (__typeof__ (ARG1), VAR22_TYPE1) \ + && __builtin_types_compatible_p (__typeof__ (ARG2), VAR22_TYPE2), \ + _##NAME##_##VAR22_ID ((VAR22_TYPE1)ARG1, (VAR22_TYPE2)ARG2), \ + __builtin_choose_expr ( \ + __builtin_types_compatible_p (__typeof__ (ARG1), VAR23_TYPE1) \ + && __builtin_types_compatible_p (__typeof__ (ARG2), VAR23_TYPE2), \ + _##NAME##_##VAR23_ID ((VAR23_TYPE1)ARG1, (VAR23_TYPE2)ARG2), \ + __builtin_choose_expr ( \ + __builtin_types_compatible_p (__typeof__ (ARG1), VAR24_TYPE1) \ + && __builtin_types_compatible_p (__typeof__ (ARG2), VAR24_TYPE2), \ + _##NAME##_##VAR24_ID ((VAR24_TYPE1)ARG1, (VAR24_TYPE2)ARG2), \ + __builtin_choose_expr ( \ + __builtin_types_compatible_p (__typeof__ (ARG1), VAR25_TYPE1) \ + && __builtin_types_compatible_p (__typeof__ (ARG2), VAR25_TYPE2), \ + _##NAME##_##VAR25_ID ((VAR25_TYPE1)ARG1, (VAR25_TYPE2)ARG2), \ + __builtin_choose_expr ( \ + __builtin_types_compatible_p (__typeof__ (ARG1), VAR26_TYPE1) \ + && __builtin_types_compatible_p (__typeof__ (ARG2), VAR26_TYPE2), \ + _##NAME##_##VAR26_ID ((VAR26_TYPE1)ARG1, (VAR26_TYPE2)ARG2), \ + __builtin_choose_expr ( \ + __builtin_types_compatible_p (__typeof__ (ARG1), VAR27_TYPE1) \ + && __builtin_types_compatible_p (__typeof__ (ARG2), VAR27_TYPE2), \ + _##NAME##_##VAR27_ID ((VAR27_TYPE1)ARG1, (VAR27_TYPE2)ARG2), \ + __builtin_choose_expr ( \ + __builtin_types_compatible_p (__typeof__ (ARG1), VAR28_TYPE1) \ + && __builtin_types_compatible_p (__typeof__ (ARG2), VAR28_TYPE2), \ + _##NAME##_##VAR28_ID ((VAR28_TYPE1)ARG1, (VAR28_TYPE2)ARG2), \ + (void)0)))))))))))))))))))))))))))) + +/* Macros named OVERLOAD_ARG_DECL provide a declaration for one + variant of an overloaded built-in function having N arguments. + Please maintain these macros in increasing order by N for ease + of reuse. */ + +#define OVERLOAD_2ARG_DECL(NAME, VAR_ID, TYPE0, \ + TYPE1, ARG1, \ + TYPE2, ARG2) \ +static __inline__ TYPE0 __attribute__ ((__always_inline__)) \ +_##NAME##_##VAR_ID (TYPE1 ARG1, TYPE2 ARG2) + +/* With C++, we can just use function overloading. */ +#elif defined __cplusplus && !defined __STRICT_ANSI__ + +#define OVERLOAD_2ARG_DECL(NAME, VAR_ID, TYPE0, \ + TYPE1, ARG1, \ + TYPE2, ARG2) \ +static __inline__ TYPE0 __attribute__ ((__always_inline__)) \ +NAME (TYPE1 ARG1, TYPE2 ARG2) + +#endif /* !__cplusplus && !__STRICT_ANSI__ */ + +#endif /* _OVERLOAD_H */ Index: gcc/config.gcc =================================================================== --- gcc/config.gcc (revision 241624) +++ gcc/config.gcc (working copy) @@ -440,7 +440,7 @@ nvptx-*-*) ;; powerpc*-*-*) cpu_type=rs6000 - extra_headers="ppc-asm.h altivec.h spe.h ppu_intrinsics.h paired.h spu2vmx.h vec_types.h si2vmx.h htmintrin.h htmxlintrin.h" + extra_headers="ppc-asm.h altivec.h spe.h ppu_intrinsics.h paired.h spu2vmx.h vec_types.h si2vmx.h htmintrin.h htmxlintrin.h overload.h" case x$with_cpu in xpowerpc64|xdefault64|x6[23]0|x970|xG5|xpower[3456789]|xpower6x|xrs64a|xcell|xa2|xe500mc64|xe5500|xe6500) cpu_is_64bit=yes @@ -2279,13 +2279,13 @@ powerpc-*-darwin*) ;; esac tmake_file="${tmake_file} t-slibgcc" - extra_headers=altivec.h + extra_headers="altivec.h overload.h" ;; powerpc64-*-darwin*) extra_options="${extra_options} ${cpu_type}/darwin.opt" tmake_file="${tmake_file} ${cpu_type}/t-darwin64 t-slibgcc" tm_file="${tm_file} ${cpu_type}/darwin8.h ${cpu_type}/darwin64.h" - extra_headers=altivec.h + extra_headers="altivec.h overload.h" ;; powerpc*-*-freebsd*) tm_file="${tm_file} dbxelf.h elfos.h ${fbsd_tm_file} rs6000/sysv4.h" @@ -2512,7 +2512,7 @@ rs6000-ibm-aix5.3.* | powerpc-ibm-aix5.3.*) use_collect2=yes thread_file='aix' use_gcc_stdint=wrap - extra_headers=altivec.h + extra_headers="altivec.h overload.h" ;; rs6000-ibm-aix6.* | powerpc-ibm-aix6.*) tm_file="${tm_file} rs6000/aix.h rs6000/aix61.h rs6000/xcoff.h rs6000/aix-stdint.h" @@ -2521,7 +2521,7 @@ rs6000-ibm-aix6.* | powerpc-ibm-aix6.*) use_collect2=yes thread_file='aix' use_gcc_stdint=wrap - extra_headers=altivec.h + extra_headers="altivec.h overload.h" default_use_cxa_atexit=yes ;; rs6000-ibm-aix[789].* | powerpc-ibm-aix[789].*) @@ -2531,7 +2531,7 @@ rs6000-ibm-aix[789].* | powerpc-ibm-aix[789].*) use_collect2=yes thread_file='aix' use_gcc_stdint=wrap - extra_headers=altivec.h + extra_headers="altivec.h overload.h" default_use_cxa_atexit=yes ;; rl78-*-elf*) Index: gcc/testsuite/gcc.target/powerpc/overload-add-1.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/overload-add-1.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/overload-add-1.c (working copy) @@ -0,0 +1,46 @@ +/* Verify that overloaded built-ins for vec_add with char + inputs produce the right results. */ + +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_altivec_ok } */ +/* { dg-additional-options "-std=gnu11" } */ + +#include + +vector signed char +test1 (vector bool char x, vector signed char y) +{ + return vec_add (x, y); +} + +vector signed char +test2 (vector signed char x, vector bool char y) +{ + return vec_add (x, y); +} + +vector signed char +test3 (vector signed char x, vector signed char y) +{ + return vec_add (x, y); +} + +vector unsigned char +test4 (vector bool char x, vector unsigned char y) +{ + return vec_add (x, y); +} + +vector unsigned char +test5 (vector unsigned char x, vector bool char y) +{ + return vec_add (x, y); +} + +vector unsigned char +test6 (vector unsigned char x, vector unsigned char y) +{ + return vec_add (x, y); +} + +/* { dg-final { scan-assembler-times "vaddubm" 6 } } */ Index: gcc/testsuite/gcc.target/powerpc/overload-add-2.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/overload-add-2.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/overload-add-2.c (working copy) @@ -0,0 +1,46 @@ +/* Verify that overloaded built-ins for vec_add with short + inputs produce the right results. */ + +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_altivec_ok } */ +/* { dg-additional-options "-std=gnu11" } */ + +#include + +vector signed short +test1 (vector bool short x, vector signed short y) +{ + return vec_add (x, y); +} + +vector signed short +test2 (vector signed short x, vector bool short y) +{ + return vec_add (x, y); +} + +vector signed short +test3 (vector signed short x, vector signed short y) +{ + return vec_add (x, y); +} + +vector unsigned short +test4 (vector bool short x, vector unsigned short y) +{ + return vec_add (x, y); +} + +vector unsigned short +test5 (vector unsigned short x, vector bool short y) +{ + return vec_add (x, y); +} + +vector unsigned short +test6 (vector unsigned short x, vector unsigned short y) +{ + return vec_add (x, y); +} + +/* { dg-final { scan-assembler-times "vadduhm" 6 } } */ Index: gcc/testsuite/gcc.target/powerpc/overload-add-3.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/overload-add-3.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/overload-add-3.c (working copy) @@ -0,0 +1,46 @@ +/* Verify that overloaded built-ins for vec_add with int + inputs produce the right results. */ + +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_altivec_ok } */ +/* { dg-additional-options "-std=gnu11" } */ + +#include + +vector signed int +test1 (vector bool int x, vector signed int y) +{ + return vec_add (x, y); +} + +vector signed int +test2 (vector signed int x, vector bool int y) +{ + return vec_add (x, y); +} + +vector signed int +test3 (vector signed int x, vector signed int y) +{ + return vec_add (x, y); +} + +vector unsigned int +test4 (vector bool int x, vector unsigned int y) +{ + return vec_add (x, y); +} + +vector unsigned int +test5 (vector unsigned int x, vector bool int y) +{ + return vec_add (x, y); +} + +vector unsigned int +test6 (vector unsigned int x, vector unsigned int y) +{ + return vec_add (x, y); +} + +/* { dg-final { scan-assembler-times "vadduwm" 6 } } */ Index: gcc/testsuite/gcc.target/powerpc/overload-add-4.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/overload-add-4.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/overload-add-4.c (working copy) @@ -0,0 +1,46 @@ +/* Verify that overloaded built-ins for vec_add with long long + inputs produce the right results. */ + +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-additional-options "-std=gnu11" } */ + +#include + +vector signed long long +test1 (vector bool long long x, vector signed long long y) +{ + return vec_add (x, y); +} + +vector signed long long +test2 (vector signed long long x, vector bool long long y) +{ + return vec_add (x, y); +} + +vector signed long long +test3 (vector signed long long x, vector signed long long y) +{ + return vec_add (x, y); +} + +vector unsigned long long +test4 (vector bool long long x, vector unsigned long long y) +{ + return vec_add (x, y); +} + +vector unsigned long long +test5 (vector unsigned long long x, vector bool long long y) +{ + return vec_add (x, y); +} + +vector unsigned long long +test6 (vector unsigned long long x, vector unsigned long long y) +{ + return vec_add (x, y); +} + +/* { dg-final { scan-assembler-times "vaddudm" 6 } } */ Index: gcc/testsuite/gcc.target/powerpc/overload-add-5.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/overload-add-5.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/overload-add-5.c (working copy) @@ -0,0 +1,16 @@ +/* Verify that overloaded built-ins for vec_add with float + inputs produce the right results. */ + +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_altivec_ok } */ +/* { dg-additional-options "-std=gnu11 -mno-vsx" } */ + +#include + +vector float +test1 (vector float x, vector float y) +{ + return vec_add (x, y); +} + +/* { dg-final { scan-assembler-times "vaddfp" 1 } } */ Index: gcc/testsuite/gcc.target/powerpc/overload-add-6.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/overload-add-6.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/overload-add-6.c (working copy) @@ -0,0 +1,23 @@ +/* Verify that overloaded built-ins for vec_add with float and + double inputs for VSX produce the right results. */ + +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-additional-options "-std=gnu11" } */ + +#include + +vector float +test1 (vector float x, vector float y) +{ + return vec_add (x, y); +} + +vector double +test2 (vector double x, vector double y) +{ + return vec_add (x, y); +} + +/* { dg-final { scan-assembler-times "xvaddsp" 1 } } */ +/* { dg-final { scan-assembler-times "xvadddp" 1 } } */ Index: gcc/testsuite/gcc.target/powerpc/overload-add-7.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/overload-add-7.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/overload-add-7.c (working copy) @@ -0,0 +1,22 @@ +/* Verify that overloaded built-ins for vec_add with __int128 + inputs produce the right results. */ + +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-additional-options "-std=gnu11 -Wno-pedantic" } */ + +#include "altivec.h" + +vector signed __int128 +test1 (vector signed __int128 x, vector signed __int128 y) +{ + return vec_add (x, y); +} + +vector unsigned __int128 +test2 (vector unsigned __int128 x, vector unsigned __int128 y) +{ + return vec_add (x, y); +} + +/* { dg-final { scan-assembler-times "vadduqm" 2 } } */