From patchwork Fri Aug 20 19:27:21 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Paul A. Clarke" X-Patchwork-Id: 1519170 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=bKfIGqQ5; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GrsHY668mz9sX1 for ; Sat, 21 Aug 2021 05:33:13 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A02D4386EC05 for ; Fri, 20 Aug 2021 19:33:11 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A02D4386EC05 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1629487991; bh=6W81hsYme2w/pKjA+WrDYFaD4vmFpohLz/sgtO+mE0U=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=bKfIGqQ5TFsIJAf7ha3YwWuT8P9yEiLvYdbzK36SwlIMxR2Hqeb9mX/wKDvDtrUhP zoPYGxP6Z+0YurhmgMTm45XCbG1ZeLVQ4P6CADzZo4qoSc680NXE/ZRVfxoU/HMXhu 8bdp3oD8BVabd24CA+iOAvpLoE7xzijSeBUK6AQo= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 57004385C400 for ; Fri, 20 Aug 2021 19:27:43 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 57004385C400 Received: from pps.filterd (m0098421.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 17KJ4PCB119431; Fri, 20 Aug 2021 15:27:42 -0400 Received: from ppma03dal.us.ibm.com (b.bd.3ea9.ip4.static.sl-reverse.com [169.62.189.11]) by mx0a-001b2d01.pphosted.com with ESMTP id 3ahpr8utc4-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 20 Aug 2021 15:27:42 -0400 Received: from pps.filterd (ppma03dal.us.ibm.com [127.0.0.1]) by ppma03dal.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 17KJ8XEi020410; Fri, 20 Aug 2021 19:27:41 GMT Received: from b01cxnp23032.gho.pok.ibm.com (b01cxnp23032.gho.pok.ibm.com [9.57.198.27]) by ppma03dal.us.ibm.com with ESMTP id 3ae5fjatrh-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 20 Aug 2021 19:27:41 +0000 Received: from b01ledav001.gho.pok.ibm.com (b01ledav001.gho.pok.ibm.com [9.57.199.106]) by b01cxnp23032.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 17KJRenj53281124 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 20 Aug 2021 19:27:40 GMT Received: from b01ledav001.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4CEB82806A; Fri, 20 Aug 2021 19:27:40 +0000 (GMT) Received: from b01ledav001.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 0B8992806F; Fri, 20 Aug 2021 19:27:40 +0000 (GMT) Received: from localhost (unknown [9.77.137.12]) by b01ledav001.gho.pok.ibm.com (Postfix) with ESMTP; Fri, 20 Aug 2021 19:27:39 +0000 (GMT) To: gcc-patches@gcc.gnu.org Subject: [PATCH v2 1/6] rs6000: Support SSE4.1 "round" intrinsics Date: Fri, 20 Aug 2021 14:27:21 -0500 Message-Id: <20210820192726.1575524-2-pc@us.ibm.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20210820192726.1575524-1-pc@us.ibm.com> References: <20210820192726.1575524-1-pc@us.ibm.com> MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: UERE4Bdc-52hrlCUt-5wTi7J3S79HyQQ X-Proofpoint-ORIG-GUID: UERE4Bdc-52hrlCUt-5wTi7J3S79HyQQ X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.790 definitions=2021-08-20_08:2021-08-20, 2021-08-20 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 adultscore=0 bulkscore=0 priorityscore=1501 suspectscore=0 mlxlogscore=999 malwarescore=0 impostorscore=0 phishscore=0 lowpriorityscore=0 clxscore=1015 spamscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2107140000 definitions=main-2108200107 X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: "Paul A. Clarke via Gcc-patches" From: "Paul A. Clarke" Reply-To: "Paul A. Clarke" Cc: wschmidt@linux.ibm.com, segher@kernel.crashing.org Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" Suppress exceptions (when specified), by saving, manipulating, and restoring the FPSCR. Similarly, save, set, and restore the floating-point rounding mode when required. No attempt is made to optimize writing the FPSCR (by checking if the new value would be the same), other than using lighter weight instructions when possible. The scalar versions naively use the parallel versions to compute the single scalar result and then construct the remainder of the result. Of minor note, the values of _MM_FROUND_TO_NEG_INF and _MM_FROUND_TO_ZERO are swapped from the corresponding values on x86 so as to match the corresponding rounding mode values in the Power ISA. Move implementations of _mm_ceil* and _mm_floor* into _mm_round*, and convert _mm_ceil* and _mm_floor* into macros. This matches the current analogous implementations in config/i386/smmintrin.h. Function signatures match the analogous functions in config/i386/smmintrin.h. Add tests for _mm_round_pd, _mm_round_ps, _mm_round_sd, _mm_round_ss, modeled after the very similar "floor" and "ceil" tests. Include basic tests, plus tests at the boundaries for floating-point representation, positive and negative, test all of the parameterized rounding modes as well as the C99 rounding modes and interactions between the two. Exceptions are not explicitly tested. 2021-08-20 Paul A. Clarke gcc * config/rs6000/smmintrin.h (_mm_round_pd, _mm_round_ps, _mm_round_sd, _mm_round_ss, _MM_FROUND_TO_NEAREST_INT, _MM_FROUND_TO_ZERO, _MM_FROUND_TO_POS_INF, _MM_FROUND_TO_NEG_INF, _MM_FROUND_CUR_DIRECTION, _MM_FROUND_RAISE_EXC, _MM_FROUND_NO_EXC, _MM_FROUND_NINT, _MM_FROUND_FLOOR, _MM_FROUND_CEIL, _MM_FROUND_TRUNC, _MM_FROUND_RINT, _MM_FROUND_NEARBYINT): New. * config/rs6000/smmintrin.h (_mm_ceil_pd, _mm_ceil_ps, _mm_ceil_sd, _mm_ceil_ss, _mm_floor_pd, _mm_floor_ps, _mm_floor_sd, _mm_floor_ss): Convert from function to macro. gcc/testsuite * gcc.target/powerpc/sse4_1-round3.h: New. * gcc.target/powerpc/sse4_1-roundpd.c: New. * gcc.target/powerpc/sse4_1-roundps.c: New. * gcc.target/powerpc/sse4_1-roundsd.c: New. * gcc.target/powerpc/sse4_1-roundss.c: New. --- v2: - Replaced clever (and broken) exception masking with more straightforward implementation, per v1 review and closer inspection. mtfsf was only writing the final nybble (1) instead of the final two nybbles (2), so not all of the exception-enable bits were cleared. - Renamed some variables from cryptic "tmp" and "save" to "fpscr_save" and "enables_save". - Retained use of __builtin_mffsl, since that is supported pre-POWER8 (with an alternate instruction sequence). - Added "extern" to functions to maintain compatible decorations with like implementations in gcc/config/i386. - Added some additional text to the commit message about some of the (unpleasant?) implementations and decorations coming from like implementations in gcc/config/i386, per v1 review. - Removed "-Wno-psabi" from tests as unnecessary, per v1 review. - Fixed indentation and other minor formatting changes, per v1 review. - Noted testing in patch series cover letter. gcc/config/rs6000/smmintrin.h | 240 +++++++++++----- .../gcc.target/powerpc/sse4_1-round3.h | 81 ++++++ .../gcc.target/powerpc/sse4_1-roundpd.c | 143 ++++++++++ .../gcc.target/powerpc/sse4_1-roundps.c | 98 +++++++ .../gcc.target/powerpc/sse4_1-roundsd.c | 256 ++++++++++++++++++ .../gcc.target/powerpc/sse4_1-roundss.c | 208 ++++++++++++++ 6 files changed, 962 insertions(+), 64 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round3.h create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd.c create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundps.c create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundsd.c create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundss.c diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h index 3767a67eada7..a6b88d313ad0 100644 --- a/gcc/config/rs6000/smmintrin.h +++ b/gcc/config/rs6000/smmintrin.h @@ -42,6 +42,182 @@ #include #include +/* Rounding mode macros. */ +#define _MM_FROUND_TO_NEAREST_INT 0x00 +#define _MM_FROUND_TO_ZERO 0x01 +#define _MM_FROUND_TO_POS_INF 0x02 +#define _MM_FROUND_TO_NEG_INF 0x03 +#define _MM_FROUND_CUR_DIRECTION 0x04 + +#define _MM_FROUND_NINT \ + (_MM_FROUND_TO_NEAREST_INT | _MM_FROUND_RAISE_EXC) +#define _MM_FROUND_FLOOR \ + (_MM_FROUND_TO_NEG_INF | _MM_FROUND_RAISE_EXC) +#define _MM_FROUND_CEIL \ + (_MM_FROUND_TO_POS_INF | _MM_FROUND_RAISE_EXC) +#define _MM_FROUND_TRUNC \ + (_MM_FROUND_TO_ZERO | _MM_FROUND_RAISE_EXC) +#define _MM_FROUND_RINT \ + (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_RAISE_EXC) +#define _MM_FROUND_NEARBYINT \ + (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_NO_EXC) + +#define _MM_FROUND_RAISE_EXC 0x00 +#define _MM_FROUND_NO_EXC 0x08 + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_round_pd (__m128d __A, int __rounding) +{ + __v2df __r; + union { + double __fr; + long long __fpscr; + } __enables_save, __fpscr_save; + + if (__rounding & _MM_FROUND_NO_EXC) + { + /* Save enabled exceptions, disable all exceptions, + and preserve the rounding mode. */ +#ifdef _ARCH_PWR9 + __asm__ __volatile__ ("mffsce %0" : "=f" (__fpscr_save.__fr)); + __enables_save.__fpscr = __fpscr_save.__fpscr & 0xf8; +#else + __fpscr_save.__fr = __builtin_mffs (); + __enables_save.__fpscr = __fpscr_save.__fpscr & 0xf8; + __fpscr_save.__fpscr &= ~0xf8; + __builtin_mtfsf (0b00000011, __fpscr_save.__fr); +#endif + } + + switch (__rounding) + { + case _MM_FROUND_TO_NEAREST_INT: + __fpscr_save.__fr = __builtin_mffsl (); + __attribute__ ((fallthrough)); + case _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC: + __builtin_set_fpscr_rn (0b00); + __r = vec_rint ((__v2df) __A); + __builtin_set_fpscr_rn (__fpscr_save.__fpscr); + break; + case _MM_FROUND_TO_NEG_INF: + case _MM_FROUND_TO_NEG_INF | _MM_FROUND_NO_EXC: + __r = vec_floor ((__v2df) __A); + break; + case _MM_FROUND_TO_POS_INF: + case _MM_FROUND_TO_POS_INF | _MM_FROUND_NO_EXC: + __r = vec_ceil ((__v2df) __A); + break; + case _MM_FROUND_TO_ZERO: + case _MM_FROUND_TO_ZERO | _MM_FROUND_NO_EXC: + __r = vec_trunc ((__v2df) __A); + break; + case _MM_FROUND_CUR_DIRECTION: + __r = vec_rint ((__v2df) __A); + break; + } + if (__rounding & _MM_FROUND_NO_EXC) + { + /* Restore enabled exceptions. */ + __fpscr_save.__fr = __builtin_mffsl (); + __fpscr_save.__fpscr |= __enables_save.__fpscr; + __builtin_mtfsf (0b00000011, __fpscr_save.__fr); + } + return (__m128d) __r; +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_round_sd (__m128d __A, __m128d __B, int __rounding) +{ + __B = _mm_round_pd (__B, __rounding); + __v2df __r = { ((__v2df)__B)[0], ((__v2df) __A)[1] }; + return (__m128d) __r; +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_round_ps (__m128 __A, int __rounding) +{ + __v4sf __r; + union { + double __fr; + long long __fpscr; + } __enables_save, __fpscr_save; + + if (__rounding & _MM_FROUND_NO_EXC) + { + /* Save enabled exceptions, disable all exceptions, + and preserve the rounding mode. */ +#ifdef _ARCH_PWR9 + __asm__ __volatile__ ("mffsce %0" : "=f" (__fpscr_save.__fr)); + __enables_save.__fpscr = __fpscr_save.__fpscr & 0xf8; +#else + __fpscr_save.__fr = __builtin_mffs (); + __enables_save.__fpscr = __fpscr_save.__fpscr & 0xf8; + __fpscr_save.__fpscr &= ~0xf8; + __builtin_mtfsf (0b00000011, __fpscr_save.__fr); +#endif + } + + switch (__rounding) + { + case _MM_FROUND_TO_NEAREST_INT: + __fpscr_save.__fr = __builtin_mffsl (); + __attribute__ ((fallthrough)); + case _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC: + __builtin_set_fpscr_rn (0b00); + __r = vec_rint ((__v4sf) __A); + __builtin_set_fpscr_rn (__fpscr_save.__fpscr); + break; + case _MM_FROUND_TO_NEG_INF: + case _MM_FROUND_TO_NEG_INF | _MM_FROUND_NO_EXC: + __r = vec_floor ((__v4sf) __A); + break; + case _MM_FROUND_TO_POS_INF: + case _MM_FROUND_TO_POS_INF | _MM_FROUND_NO_EXC: + __r = vec_ceil ((__v4sf) __A); + break; + case _MM_FROUND_TO_ZERO: + case _MM_FROUND_TO_ZERO | _MM_FROUND_NO_EXC: + __r = vec_trunc ((__v4sf) __A); + break; + case _MM_FROUND_CUR_DIRECTION: + __r = vec_rint ((__v4sf) __A); + break; + } + if (__rounding & _MM_FROUND_NO_EXC) + { + /* Restore enabled exceptions. */ + __fpscr_save.__fr = __builtin_mffsl (); + __fpscr_save.__fpscr |= __enables_save.__fpscr; + __builtin_mtfsf (0b00000011, __fpscr_save.__fr); + } + return (__m128) __r; +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_round_ss (__m128 __A, __m128 __B, int __rounding) +{ + __B = _mm_round_ps (__B, __rounding); + __v4sf __r = (__v4sf) __A; + __r[0] = ((__v4sf)__B)[0]; + return (__m128) __r; +} + +#define _mm_ceil_pd(V) _mm_round_pd ((V), _MM_FROUND_CEIL) +#define _mm_ceil_sd(D, V) _mm_round_sd ((D), (V), _MM_FROUND_CEIL) + +#define _mm_floor_pd(V) _mm_round_pd((V), _MM_FROUND_FLOOR) +#define _mm_floor_sd(D, V) _mm_round_sd ((D), (V), _MM_FROUND_FLOOR) + +#define _mm_ceil_ps(V) _mm_round_ps ((V), _MM_FROUND_CEIL) +#define _mm_ceil_ss(D, V) _mm_round_ss ((D), (V), _MM_FROUND_CEIL) + +#define _mm_floor_ps(V) _mm_round_ps ((V), _MM_FROUND_FLOOR) +#define _mm_floor_ss(D, V) _mm_round_ss ((D), (V), _MM_FROUND_FLOOR) + extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_insert_epi8 (__m128i const __A, int const __D, int const __N) { @@ -232,70 +408,6 @@ _mm_test_mix_ones_zeros (__m128i __A, __m128i __mask) return any_ones * any_zeros; } -__inline __m128d -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_ceil_pd (__m128d __A) -{ - return (__m128d) vec_ceil ((__v2df) __A); -} - -__inline __m128d -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_ceil_sd (__m128d __A, __m128d __B) -{ - __v2df __r = vec_ceil ((__v2df) __B); - __r[1] = ((__v2df) __A)[1]; - return (__m128d) __r; -} - -__inline __m128d -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_floor_pd (__m128d __A) -{ - return (__m128d) vec_floor ((__v2df) __A); -} - -__inline __m128d -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_floor_sd (__m128d __A, __m128d __B) -{ - __v2df __r = vec_floor ((__v2df) __B); - __r[1] = ((__v2df) __A)[1]; - return (__m128d) __r; -} - -__inline __m128 -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_ceil_ps (__m128 __A) -{ - return (__m128) vec_ceil ((__v4sf) __A); -} - -__inline __m128 -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_ceil_ss (__m128 __A, __m128 __B) -{ - __v4sf __r = (__v4sf) __A; - __r[0] = __builtin_ceil (((__v4sf) __B)[0]); - return __r; -} - -__inline __m128 -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_floor_ps (__m128 __A) -{ - return (__m128) vec_floor ((__v4sf) __A); -} - -__inline __m128 -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_floor_ss (__m128 __A, __m128 __B) -{ - __v4sf __r = (__v4sf) __A; - __r[0] = __builtin_floor (((__v4sf) __B)[0]); - return __r; -} - /* Return horizontal packed word minimum and its index in bits [15:0] and bits [18:16] respectively. */ __inline __m128i diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-round3.h b/gcc/testsuite/gcc.target/powerpc/sse4_1-round3.h new file mode 100644 index 000000000000..de6cbf7be438 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-round3.h @@ -0,0 +1,81 @@ +#include +#include +#include "sse4_1-check.h" + +#define DIM(a) (sizeof (a) / sizeof (a)[0]) + +static int roundings[] = + { + _MM_FROUND_TO_NEAREST_INT, + _MM_FROUND_TO_NEG_INF, + _MM_FROUND_TO_POS_INF, + _MM_FROUND_TO_ZERO, + _MM_FROUND_CUR_DIRECTION + }; + +static int modes[] = + { + FE_TONEAREST, + FE_UPWARD, + FE_DOWNWARD, + FE_TOWARDZERO + }; + +static void +TEST (void) +{ + int i, j, ri, mi, round_save; + + round_save = fegetround (); + for (mi = 0; mi < DIM (modes); mi++) { + fesetround (modes[mi]); + for (i = 0; i < DIM (data); i++) { + for (ri = 0; ri < DIM (roundings); ri++) { + union value guess; + union value *current_answers = answers[ri]; + switch ( roundings[ri] ) { + case _MM_FROUND_TO_NEAREST_INT: + guess.x = ROUND_INTRIN (data[i].value1.x, data[i].value2.x, + _MM_FROUND_TO_NEAREST_INT); + break; + case _MM_FROUND_TO_NEG_INF: + guess.x = ROUND_INTRIN (data[i].value1.x, data[i].value2.x, + _MM_FROUND_TO_NEG_INF); + break; + case _MM_FROUND_TO_POS_INF: + guess.x = ROUND_INTRIN (data[i].value1.x, data[i].value2.x, + _MM_FROUND_TO_POS_INF); + break; + case _MM_FROUND_TO_ZERO: + guess.x = ROUND_INTRIN (data[i].value1.x, data[i].value2.x, + _MM_FROUND_TO_ZERO); + break; + case _MM_FROUND_CUR_DIRECTION: + guess.x = ROUND_INTRIN (data[i].value1.x, data[i].value2.x, + _MM_FROUND_CUR_DIRECTION); + switch ( modes[mi] ) { + case FE_TONEAREST: + current_answers = answers_NEAREST_INT; + break; + case FE_UPWARD: + current_answers = answers_POS_INF; + break; + case FE_DOWNWARD: + current_answers = answers_NEG_INF; + break; + case FE_TOWARDZERO: + current_answers = answers_ZERO; + break; + } + break; + default: + abort (); + } + for (j = 0; j < DIM (guess.f); j++) + if (guess.f[j] != current_answers[i].f[j]) + abort (); + } + } + } + fesetround (round_save); +} diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd.c new file mode 100644 index 000000000000..0528c395f233 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd.c @@ -0,0 +1,143 @@ +/* { dg-do run } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-O2 -mvsx" } */ + +#define NO_WARN_X86_INTRINSICS 1 +#include + +#define VEC_T __m128d +#define FP_T double + +#define ROUND_INTRIN(x, ignored, mode) _mm_round_pd (x, mode) + +#include "sse4_1-round-data.h" + +struct data2 data[] = { + { .value1 = { .f = { 0.00, 0.25 } } }, + { .value1 = { .f = { 0.50, 0.75 } } }, + + { .value1 = { .f = { 0x1.ffffffffffffcp+50, 0x1.ffffffffffffdp+50 } } }, + { .value1 = { .f = { 0x1.ffffffffffffep+50, 0x1.fffffffffffffp+50 } } }, + { .value1 = { .f = { 0x1.0000000000000p+51, 0x1.0000000000001p+51 } } }, + { .value1 = { .f = { 0x1.0000000000002p+51, 0x1.0000000000003p+51 } } }, + + { .value1 = { .f = { 0x1.ffffffffffffep+51, 0x1.fffffffffffffp+51 } } }, + { .value1 = { .f = { 0x1.0000000000000p+52, 0x1.0000000000001p+52 } } }, + + { .value1 = { .f = { -0x1.0000000000001p+52, -0x1.0000000000000p+52 } } }, + { .value1 = { .f = { -0x1.fffffffffffffp+51, -0x1.ffffffffffffep+51 } } }, + + { .value1 = { .f = { -0x1.0000000000004p+51, -0x1.0000000000002p+51 } } }, + { .value1 = { .f = { -0x1.0000000000001p+51, -0x1.0000000000000p+51 } } }, + { .value1 = { .f = { -0x1.ffffffffffffcp+50, -0x1.ffffffffffffep+50 } } }, + { .value1 = { .f = { -0x1.ffffffffffffdp+50, -0x1.ffffffffffffcp+50 } } }, + + { .value1 = { .f = { -1.00, -0.75 } } }, + { .value1 = { .f = { -0.50, -0.25 } } } +}; + +union value answers_NEAREST_INT[] = { + { .f = { 0.00, 0.00 } }, + { .f = { 0.00, 1.00 } }, + + { .f = { 0x1.ffffffffffffcp+50, 0x1.ffffffffffffcp+50 } }, + { .f = { 0x1.0000000000000p+51, 0x1.0000000000000p+51 } }, + { .f = { 0x1.0000000000000p+51, 0x1.0000000000000p+51 } }, + { .f = { 0x1.0000000000002p+51, 0x1.0000000000004p+51 } }, + + { .f = { 0x1.ffffffffffffep+51, 0x1.0000000000000p+52 } }, + { .f = { 0x1.0000000000000p+52, 0x1.0000000000001p+52 } }, + + { .f = { -0x1.0000000000001p+52, -0x1.0000000000000p+52 } }, + { .f = { -0x1.0000000000000p+52, -0x1.ffffffffffffep+51 } }, + + { .f = { -0x1.0000000000004p+51, -0x1.0000000000002p+51 } }, + { .f = { -0x1.0000000000000p+51, -0x1.0000000000000p+51 } }, + { .f = { -0x1.ffffffffffffcp+50, -0x1.0000000000000p+51 } }, + { .f = { -0x1.ffffffffffffcp+50, -0x1.ffffffffffffcp+50 } }, + + { .f = { -1.00, -1.00 } }, + { .f = { 0.00, 0.00 } } +}; + +union value answers_NEG_INF[] = { + { .f = { 0.00, 0.00 } }, + { .f = { 0.00, 0.00 } }, + + { .f = { 0x1.ffffffffffffcp+50, 0x1.ffffffffffffcp+50 } }, + { .f = { 0x1.ffffffffffffcp+50, 0x1.ffffffffffffcp+50 } }, + { .f = { 0x1.0000000000000p+51, 0x1.0000000000000p+51 } }, + { .f = { 0x1.0000000000002p+51, 0x1.0000000000002p+51 } }, + + { .f = { 0x1.ffffffffffffep+51, 0x1.ffffffffffffep+51 } }, + { .f = { 0x1.0000000000000p+52, 0x1.0000000000001p+52 } }, + + { .f = { -0x1.0000000000001p+52, -0x1.0000000000000p+52 } }, + { .f = { -0x1.0000000000000p+52, -0x1.ffffffffffffep+51 } }, + + { .f = { -0x1.0000000000004p+51, -0x1.0000000000002p+51 } }, + { .f = { -0x1.0000000000002p+51, -0x1.0000000000000p+51 } }, + { .f = { -0x1.ffffffffffffcp+50, -0x1.0000000000000p+51 } }, + { .f = { -0x1.0000000000000p+51, -0x1.ffffffffffffcp+50 } }, + + { .f = { -1.00, -1.00 } }, + { .f = { -1.00, -1.00 } } +}; + +union value answers_POS_INF[] = { + { .f = { 0.00, 1.00 } }, + { .f = { 1.00, 1.00 } }, + + { .f = { 0x1.ffffffffffffcp+50, 0x1.0000000000000p+51 } }, + { .f = { 0x1.0000000000000p+51, 0x1.0000000000000p+51 } }, + { .f = { 0x1.0000000000000p+51, 0x1.0000000000002p+51 } }, + { .f = { 0x1.0000000000002p+51, 0x1.0000000000004p+51 } }, + + { .f = { 0x1.ffffffffffffep+51, 0x1.0000000000000p+52 } }, + { .f = { 0x1.0000000000000p+52, 0x1.0000000000001p+52 } }, + + { .f = { -0x1.0000000000001p+52, -0x1.0000000000000p+52 } }, + { .f = { -0x1.ffffffffffffep+51, -0x1.ffffffffffffep+51 } }, + + { .f = { -0x1.0000000000004p+51, -0x1.0000000000002p+51 } }, + { .f = { -0x1.0000000000000p+51, -0x1.0000000000000p+51 } }, + { .f = { -0x1.ffffffffffffcp+50, -0x1.ffffffffffffcp+50 } }, + { .f = { -0x1.ffffffffffffcp+50, -0x1.ffffffffffffcp+50 } }, + + { .f = { -1.00, 0.00 } }, + { .f = { 0.00, 0.00 } } +}; + +union value answers_ZERO[] = { + { .f = { 0.00, 0.00 } }, + { .f = { 0.00, 0.00 } }, + + { .f = { 0x1.ffffffffffffcp+50, 0x1.ffffffffffffcp+50 } }, + { .f = { 0x1.ffffffffffffcp+50, 0x1.ffffffffffffcp+50 } }, + { .f = { 0x1.0000000000000p+51, 0x1.0000000000000p+51 } }, + { .f = { 0x1.0000000000002p+51, 0x1.0000000000002p+51 } }, + + { .f = { 0x1.ffffffffffffep+51, 0x1.ffffffffffffep+51 } }, + { .f = { 0x1.0000000000000p+52, 0x1.0000000000001p+52 } }, + + { .f = { -0x1.0000000000001p+52, -0x1.0000000000000p+52 } }, + { .f = { -0x1.ffffffffffffep+51, -0x1.ffffffffffffep+51 } }, + + { .f = { -0x1.0000000000004p+51, -0x1.0000000000002p+51 } }, + { .f = { -0x1.0000000000000p+51, -0x1.0000000000000p+51 } }, + { .f = { -0x1.ffffffffffffcp+50, -0x1.ffffffffffffcp+50 } }, + { .f = { -0x1.ffffffffffffcp+50, -0x1.ffffffffffffcp+50 } }, + + { .f = { -1.00, 0.00 } }, + { .f = { 0.00, 0.00 } } +}; + +union value *answers[] = { + answers_NEAREST_INT, + answers_NEG_INF, + answers_POS_INF, + answers_ZERO, + 0 /* CUR_DIRECTION answers depend on current rounding mode. */ +}; + +#include "sse4_1-round3.h" diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-roundps.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundps.c new file mode 100644 index 000000000000..6b5362e07590 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundps.c @@ -0,0 +1,98 @@ +/* { dg-do run } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-O2 -mvsx" } */ + +#define NO_WARN_X86_INTRINSICS 1 +#include + +#define VEC_T __m128 +#define FP_T float + +#define ROUND_INTRIN(x, ignored, mode) _mm_round_ps (x, mode) + +#include "sse4_1-round-data.h" + +struct data2 data[] = { + { .value1 = { .f = { 0.00, 0.25, 0.50, 0.75 } } }, + + { .value1 = { .f = { 0x1.fffff8p+21, 0x1.fffffap+21, + 0x1.fffffcp+21, 0x1.fffffep+21 } } }, + { .value1 = { .f = { 0x1.fffffap+22, 0x1.fffffcp+22, + 0x1.fffffep+22, 0x1.fffffep+23 } } }, + { .value1 = { .f = { -0x1.fffffep+23, -0x1.fffffep+22, + -0x1.fffffcp+22, -0x1.fffffap+22 } } }, + { .value1 = { .f = { -0x1.fffffep+21, -0x1.fffffcp+21, + -0x1.fffffap+21, -0x1.fffff8p+21 } } }, + + { .value1 = { .f = { -1.00, -0.75, -0.50, -0.25 } } } +}; + +union value answers_NEAREST_INT[] = { + { .f = { 0.00, 0.00, 0.00, 1.00 } }, + + { .f = { 0x1.fffff8p+21, 0x1.fffff8p+21, + 0x1.000000p+22, 0x1.000000p+22 } }, + { .f = { 0x1.fffff8p+22, 0x1.fffffcp+22, + 0x1.000000p+23, 0x1.fffffep+23 } }, + { .f = { -0x1.fffffep+23, -0x1.000000p+23, + -0x1.fffffcp+22, -0x1.fffff8p+22 } }, + { .f = { -0x1.000000p+22, -0x1.000000p+22, + -0x1.fffff8p+21, -0x1.fffff8p+21 } }, + + { .f = { -1.00, -1.00, 0.00, 0.00 } } +}; + +union value answers_NEG_INF[] = { + { .f = { 0.00, 0.00, 0.00, 0.00 } }, + + { .f = { 0x1.fffff8p+21, 0x1.fffff8p+21, + 0x1.fffff8p+21, 0x1.fffff8p+21 } }, + { .f = { 0x1.fffff8p+22, 0x1.fffffcp+22, + 0x1.fffffcp+22, 0x1.fffffep+23 } }, + { .f = { -0x1.fffffep+23, -0x1.000000p+23, + -0x1.fffffcp+22, -0x1.fffffcp+22 } }, + { .f = { -0x1.000000p+22, -0x1.000000p+22, + -0x1.000000p+22, -0x1.fffff8p+21 } }, + + { .f = { -1.00, -1.00, -1.00, -1.00 } } +}; + +union value answers_POS_INF[] = { + { .f = { 0.00, 1.00, 1.00, 1.00 } }, + + { .f = { 0x1.fffff8p+21, 0x1.000000p+22, + 0x1.000000p+22, 0x1.000000p+22 } }, + { .f = { 0x1.fffffcp+22, 0x1.fffffcp+22, + 0x1.000000p+23, 0x1.fffffep+23 } }, + { .f = { -0x1.fffffep+23, -0x1.fffffcp+22, + -0x1.fffffcp+22, -0x1.fffff8p+22 } }, + { .f = { -0x1.fffff8p+21, -0x1.fffff8p+21, + -0x1.fffff8p+21, -0x1.fffff8p+21 } }, + + { .f = { -1.00, 0.00, 0.00, 0.00 } } +}; + +union value answers_ZERO[] = { + { .f = { 0.00, 0.00, 0.00, 0.00 } }, + + { .f = { 0x1.fffff8p+21, 0x1.fffff8p+21, + 0x1.fffff8p+21, 0x1.fffff8p+21 } }, + { .f = { 0x1.fffff8p+22, 0x1.fffffcp+22, + 0x1.fffffcp+22, 0x1.fffffep+23 } }, + { .f = { -0x1.fffffep+23, -0x1.fffffcp+22, + -0x1.fffffcp+22, -0x1.fffff8p+22 } }, + { .f = { -0x1.fffff8p+21, -0x1.fffff8p+21, + -0x1.fffff8p+21, -0x1.fffff8p+21 } }, + + { .f = { -1.00, 0.00, 0.00, 0.00 } } +}; + +union value *answers[] = { + answers_NEAREST_INT, + answers_NEG_INF, + answers_POS_INF, + answers_ZERO, + 0 /* CUR_DIRECTION answers depend on current rounding mode. */ +}; + +#include "sse4_1-round3.h" diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-roundsd.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundsd.c new file mode 100644 index 000000000000..2b0bad6469df --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundsd.c @@ -0,0 +1,256 @@ +/* { dg-do run } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-O2 -mvsx" } */ + +#include +#define NO_WARN_X86_INTRINSICS 1 +#include + +#define VEC_T __m128d +#define FP_T double + +#define ROUND_INTRIN(x, y, mode) _mm_round_sd (x, y, mode) + +#include "sse4_1-round-data.h" + +static struct data2 data[] = { + { .value1 = { .f = { IGNORED, PASSTHROUGH } }, + .value2 = { .f = { 0.00, IGNORED } } }, + { .value1 = { .f = { IGNORED, PASSTHROUGH } }, + .value2 = { .f = { 0.25, IGNORED } } }, + { .value1 = { .f = { IGNORED, PASSTHROUGH } }, + .value2 = { .f = { 0.50, IGNORED } } }, + { .value1 = { .f = { IGNORED, PASSTHROUGH } }, + .value2 = { .f = { 0.75, IGNORED } } }, + + { .value1 = { .f = { IGNORED, PASSTHROUGH } }, + .value2 = { .f = { 0x1.ffffffffffffcp+50, IGNORED } } }, + { .value1 = { .f = { IGNORED, PASSTHROUGH } }, + .value2 = { .f = { 0x1.ffffffffffffdp+50, IGNORED } } }, + { .value1 = { .f = { IGNORED, PASSTHROUGH } }, + .value2 = { .f = { 0x1.ffffffffffffep+50, IGNORED } } }, + { .value1 = { .f = { IGNORED, PASSTHROUGH } }, + .value2 = { .f = { 0x1.fffffffffffffp+50, IGNORED } } }, + { .value1 = { .f = { IGNORED, PASSTHROUGH } }, + .value2 = { .f = { 0x1.0000000000000p+51, IGNORED } } }, + { .value1 = { .f = { IGNORED, PASSTHROUGH } }, + .value2 = { .f = { 0x1.0000000000001p+51, IGNORED } } }, + { .value1 = { .f = { IGNORED, PASSTHROUGH } }, + .value2 = { .f = { 0x1.0000000000002p+51, IGNORED } } }, + { .value1 = { .f = { IGNORED, PASSTHROUGH } }, + .value2 = { .f = { 0x1.0000000000003p+51, IGNORED } } }, + + { .value1 = { .f = { IGNORED, PASSTHROUGH } }, + .value2 = { .f = { 0x1.ffffffffffffep+51, IGNORED } } }, + { .value1 = { .f = { IGNORED, PASSTHROUGH } }, + .value2 = { .f = { 0x1.fffffffffffffp+51, IGNORED } } }, + { .value1 = { .f = { IGNORED, PASSTHROUGH } }, + .value2 = { .f = { 0x1.0000000000000p+52, IGNORED } } }, + { .value1 = { .f = { IGNORED, PASSTHROUGH } }, + .value2 = { .f = { 0x1.0000000000001p+52, IGNORED } } }, + + { .value1 = { .f = { IGNORED, PASSTHROUGH } }, + .value2 = { .f = { -0x1.0000000000001p+52, IGNORED } } }, + { .value1 = { .f = { IGNORED, PASSTHROUGH } }, + .value2 = { .f = { -0x1.0000000000000p+52, IGNORED } } }, + { .value1 = { .f = { IGNORED, PASSTHROUGH } }, + .value2 = { .f = { -0x1.fffffffffffffp+51, IGNORED } } }, + { .value1 = { .f = { IGNORED, PASSTHROUGH } }, + .value2 = { .f = { -0x1.ffffffffffffep+51, IGNORED } } }, + + { .value1 = { .f = { IGNORED, PASSTHROUGH } }, + .value2 = { .f = { -0x1.0000000000004p+51, IGNORED } } }, + { .value1 = { .f = { IGNORED, PASSTHROUGH } }, + .value2 = { .f = { -0x1.0000000000002p+51, IGNORED } } }, + { .value1 = { .f = { IGNORED, PASSTHROUGH } }, + .value2 = { .f = { -0x1.0000000000001p+51, IGNORED } } }, + { .value1 = { .f = { IGNORED, PASSTHROUGH } }, + .value2 = { .f = { -0x1.0000000000000p+51, IGNORED } } }, + { .value1 = { .f = { IGNORED, PASSTHROUGH } }, + .value2 = { .f = { -0x1.ffffffffffffcp+50, IGNORED } } }, + { .value1 = { .f = { IGNORED, PASSTHROUGH } }, + .value2 = { .f = { -0x1.ffffffffffffep+50, IGNORED } } }, + { .value1 = { .f = { IGNORED, PASSTHROUGH } }, + .value2 = { .f = { -0x1.ffffffffffffdp+50, IGNORED } } }, + { .value1 = { .f = { IGNORED, PASSTHROUGH } }, + .value2 = { .f = { -0x1.ffffffffffffcp+50, IGNORED } } }, + + { .value1 = { .f = { IGNORED, PASSTHROUGH } }, + .value2 = { .f = { -1.00, IGNORED } } }, + { .value1 = { .f = { IGNORED, PASSTHROUGH } }, + .value2 = { .f = { -0.75, IGNORED } } }, + { .value1 = { .f = { IGNORED, PASSTHROUGH } }, + .value2 = { .f = { -0.50, IGNORED } } }, + { .value1 = { .f = { IGNORED, PASSTHROUGH } }, + .value2 = { .f = { -0.25, IGNORED } } } +}; + +static union value answers_NEAREST_INT[] = { + { .f = { 0.00, PASSTHROUGH } }, + { .f = { 0.00, PASSTHROUGH } }, + { .f = { 0.00, PASSTHROUGH } }, + { .f = { 1.00, PASSTHROUGH } }, + + { .f = { 0x1.ffffffffffffcp+50, PASSTHROUGH } }, + { .f = { 0x1.ffffffffffffcp+50, PASSTHROUGH } }, + { .f = { 0x1.0000000000000p+51, PASSTHROUGH } }, + { .f = { 0x1.0000000000000p+51, PASSTHROUGH } }, + { .f = { 0x1.0000000000000p+51, PASSTHROUGH } }, + { .f = { 0x1.0000000000000p+51, PASSTHROUGH } }, + { .f = { 0x1.0000000000002p+51, PASSTHROUGH } }, + { .f = { 0x1.0000000000004p+51, PASSTHROUGH } }, + + { .f = { 0x1.ffffffffffffep+51, PASSTHROUGH } }, + { .f = { 0x1.0000000000000p+52, PASSTHROUGH } }, + { .f = { 0x1.0000000000000p+52, PASSTHROUGH } }, + { .f = { 0x1.0000000000001p+52, PASSTHROUGH } }, + + { .f = { -0x1.0000000000001p+52, PASSTHROUGH } }, + { .f = { -0x1.0000000000000p+52, PASSTHROUGH } }, + { .f = { -0x1.0000000000000p+52, PASSTHROUGH } }, + { .f = { -0x1.ffffffffffffep+51, PASSTHROUGH } }, + + { .f = { -0x1.0000000000004p+51, PASSTHROUGH } }, + { .f = { -0x1.0000000000002p+51, PASSTHROUGH } }, + { .f = { -0x1.0000000000000p+51, PASSTHROUGH } }, + { .f = { -0x1.0000000000000p+51, PASSTHROUGH } }, + { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } }, + { .f = { -0x1.0000000000000p+51, PASSTHROUGH } }, + { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } }, + { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } }, + + { .f = { -1.00, PASSTHROUGH } }, + { .f = { -1.00, PASSTHROUGH } }, + { .f = { -0.00, PASSTHROUGH } }, + { .f = { -0.00, PASSTHROUGH } } +}; + +static union value answers_NEG_INF[] = { + { .f = { 0.00, PASSTHROUGH } }, + { .f = { 0.00, PASSTHROUGH } }, + { .f = { 0.00, PASSTHROUGH } }, + { .f = { 0.00, PASSTHROUGH } }, + + { .f = { 0x1.ffffffffffffcp+50, PASSTHROUGH } }, + { .f = { 0x1.ffffffffffffcp+50, PASSTHROUGH } }, + { .f = { 0x1.ffffffffffffcp+50, PASSTHROUGH } }, + { .f = { 0x1.ffffffffffffcp+50, PASSTHROUGH } }, + { .f = { 0x1.0000000000000p+51, PASSTHROUGH } }, + { .f = { 0x1.0000000000000p+51, PASSTHROUGH } }, + { .f = { 0x1.0000000000002p+51, PASSTHROUGH } }, + { .f = { 0x1.0000000000002p+51, PASSTHROUGH } }, + + { .f = { 0x1.ffffffffffffep+51, PASSTHROUGH } }, + { .f = { 0x1.ffffffffffffep+51, PASSTHROUGH } }, + { .f = { 0x1.0000000000000p+52, PASSTHROUGH } }, + { .f = { 0x1.0000000000001p+52, PASSTHROUGH } }, + + { .f = { -0x1.0000000000001p+52, PASSTHROUGH } }, + { .f = { -0x1.0000000000000p+52, PASSTHROUGH } }, + { .f = { -0x1.0000000000000p+52, PASSTHROUGH } }, + { .f = { -0x1.ffffffffffffep+51, PASSTHROUGH } }, + + { .f = { -0x1.0000000000004p+51, PASSTHROUGH } }, + { .f = { -0x1.0000000000002p+51, PASSTHROUGH } }, + { .f = { -0x1.0000000000002p+51, PASSTHROUGH } }, + { .f = { -0x1.0000000000000p+51, PASSTHROUGH } }, + { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } }, + { .f = { -0x1.0000000000000p+51, PASSTHROUGH } }, + { .f = { -0x1.0000000000000p+51, PASSTHROUGH } }, + { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } }, + + { .f = { -1.00, PASSTHROUGH } }, + { .f = { -1.00, PASSTHROUGH } }, + { .f = { -1.00, PASSTHROUGH } }, + { .f = { -1.00, PASSTHROUGH } } +}; + +static union value answers_POS_INF[] = { + { .f = { 0.00, PASSTHROUGH } }, + { .f = { 1.00, PASSTHROUGH } }, + { .f = { 1.00, PASSTHROUGH } }, + { .f = { 1.00, PASSTHROUGH } }, + + { .f = { 0x1.ffffffffffffcp+50, PASSTHROUGH } }, + { .f = { 0x1.0000000000000p+51, PASSTHROUGH } }, + { .f = { 0x1.0000000000000p+51, PASSTHROUGH } }, + { .f = { 0x1.0000000000000p+51, PASSTHROUGH } }, + { .f = { 0x1.0000000000000p+51, PASSTHROUGH } }, + { .f = { 0x1.0000000000002p+51, PASSTHROUGH } }, + { .f = { 0x1.0000000000002p+51, PASSTHROUGH } }, + { .f = { 0x1.0000000000004p+51, PASSTHROUGH } }, + + { .f = { 0x1.ffffffffffffep+51, PASSTHROUGH } }, + { .f = { 0x1.0000000000000p+52, PASSTHROUGH } }, + { .f = { 0x1.0000000000000p+52, PASSTHROUGH } }, + { .f = { 0x1.0000000000001p+52, PASSTHROUGH } }, + + { .f = { -0x1.0000000000001p+52, PASSTHROUGH } }, + { .f = { -0x1.0000000000000p+52, PASSTHROUGH } }, + { .f = { -0x1.ffffffffffffep+51, PASSTHROUGH } }, + { .f = { -0x1.ffffffffffffep+51, PASSTHROUGH } }, + + { .f = { -0x1.0000000000004p+51, PASSTHROUGH } }, + { .f = { -0x1.0000000000002p+51, PASSTHROUGH } }, + { .f = { -0x1.0000000000000p+51, PASSTHROUGH } }, + { .f = { -0x1.0000000000000p+51, PASSTHROUGH } }, + { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } }, + { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } }, + { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } }, + { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } }, + + { .f = { -1.00, PASSTHROUGH } }, + { .f = { 0.00, PASSTHROUGH } }, + { .f = { 0.00, PASSTHROUGH } }, + { .f = { 0.00, PASSTHROUGH } } +}; + +static union value answers_ZERO[] = { + { .f = { 0.00, PASSTHROUGH } }, + { .f = { 0.00, PASSTHROUGH } }, + { .f = { 0.00, PASSTHROUGH } }, + { .f = { 0.00, PASSTHROUGH } }, + + { .f = { 0x1.ffffffffffffcp+50, PASSTHROUGH } }, + { .f = { 0x1.ffffffffffffcp+50, PASSTHROUGH } }, + { .f = { 0x1.ffffffffffffcp+50, PASSTHROUGH } }, + { .f = { 0x1.ffffffffffffcp+50, PASSTHROUGH } }, + { .f = { 0x1.0000000000000p+51, PASSTHROUGH } }, + { .f = { 0x1.0000000000000p+51, PASSTHROUGH } }, + { .f = { 0x1.0000000000002p+51, PASSTHROUGH } }, + { .f = { 0x1.0000000000002p+51, PASSTHROUGH } }, + + { .f = { 0x1.ffffffffffffep+51, PASSTHROUGH } }, + { .f = { 0x1.ffffffffffffep+51, PASSTHROUGH } }, + { .f = { 0x1.0000000000000p+52, PASSTHROUGH } }, + { .f = { 0x1.0000000000001p+52, PASSTHROUGH } }, + + { .f = { -0x1.0000000000001p+52, PASSTHROUGH } }, + { .f = { -0x1.0000000000000p+52, PASSTHROUGH } }, + { .f = { -0x1.ffffffffffffep+51, PASSTHROUGH } }, + { .f = { -0x1.ffffffffffffep+51, PASSTHROUGH } }, + + { .f = { -0x1.0000000000004p+51, PASSTHROUGH } }, + { .f = { -0x1.0000000000002p+51, PASSTHROUGH } }, + { .f = { -0x1.0000000000000p+51, PASSTHROUGH } }, + { .f = { -0x1.0000000000000p+51, PASSTHROUGH } }, + { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } }, + { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } }, + { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } }, + { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } }, + + { .f = { -1.00, PASSTHROUGH } }, + { .f = { 0.00, PASSTHROUGH } }, + { .f = { 0.00, PASSTHROUGH } }, + { .f = { 0.00, PASSTHROUGH } } +}; + +union value *answers[] = { + answers_NEAREST_INT, + answers_NEG_INF, + answers_POS_INF, + answers_ZERO, + 0 /* CUR_DIRECTION answers depend on current rounding mode. */ +}; + +#include "sse4_1-round3.h" diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-roundss.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundss.c new file mode 100644 index 000000000000..3154310314a1 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundss.c @@ -0,0 +1,208 @@ +/* { dg-do run } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-O2 -mvsx" } */ + +#include +#define NO_WARN_X86_INTRINSICS 1 +#include + +#define VEC_T __m128 +#define FP_T float + +#define ROUND_INTRIN(x, y, mode) _mm_round_ss (x, y, mode) + +#include "sse4_1-round-data.h" + +static struct data2 data[] = { + { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + .value2 = { .f = { 0.00, IGNORED, IGNORED, IGNORED } } }, + { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + .value2 = { .f = { 0.25, IGNORED, IGNORED, IGNORED } } }, + { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + .value2 = { .f = { 0.50, IGNORED, IGNORED, IGNORED } } }, + { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + .value2 = { .f = { 0.75, IGNORED, IGNORED, IGNORED } } }, + + { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + .value2 = { .f = { 0x1.fffff8p+21, IGNORED, IGNORED, IGNORED } } }, + { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + .value2 = { .f = { 0x1.fffffap+21, IGNORED, IGNORED, IGNORED } } }, + { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + .value2 = { .f = { 0x1.fffffcp+21, IGNORED, IGNORED, IGNORED } } }, + { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + .value2 = { .f = { 0x1.fffffep+21, IGNORED, IGNORED, IGNORED } } }, + + { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + .value2 = { .f = { 0x1.fffffap+22, IGNORED, IGNORED, IGNORED } } }, + { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + .value2 = { .f = { 0x1.fffffcp+22, IGNORED, IGNORED, IGNORED } } }, + { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + .value2 = { .f = { 0x1.fffffep+22, IGNORED, IGNORED, IGNORED } } }, + { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + .value2 = { .f = { 0x1.fffffep+23, IGNORED, IGNORED, IGNORED } } }, + + { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + .value2 = { .f = { -0x1.fffffep+23, IGNORED, IGNORED, IGNORED } } }, + { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + .value2 = { .f = { -0x1.fffffep+22, IGNORED, IGNORED, IGNORED } } }, + { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + .value2 = { .f = { -0x1.fffffcp+22, IGNORED, IGNORED, IGNORED } } }, + { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + .value2 = { .f = { -0x1.fffffap+22, IGNORED, IGNORED, IGNORED } } }, + + { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + .value2 = { .f = { -0x1.fffffep+21, IGNORED, IGNORED, IGNORED } } }, + { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + .value2 = { .f = { -0x1.fffffcp+21, IGNORED, IGNORED, IGNORED } } }, + { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + .value2 = { .f = { -0x1.fffffap+21, IGNORED, IGNORED, IGNORED } } }, + { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + .value2 = { .f = { -0x1.fffff8p+21, IGNORED, IGNORED, IGNORED } } }, + + { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + .value2 = { .f = { -1.00, IGNORED, IGNORED, IGNORED } } }, + { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + .value2 = { .f = { -0.75, IGNORED, IGNORED, IGNORED } } }, + { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + .value2 = { .f = { -0.50, IGNORED, IGNORED, IGNORED } } }, + { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + .value2 = { .f = { -0.25, IGNORED, IGNORED, IGNORED } } } +}; + +static union value answers_NEAREST_INT[] = { + { .f = { 0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { 0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { 0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { 1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + + { .f = { 0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { 0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { 0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { 0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + + { .f = { 0x1.fffff8p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { 0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { 0x1.000000p+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { 0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + + { .f = { -0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { -0x1.000000p+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { -0x1.fffff8p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + + { .f = { -0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { -0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + + { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { -0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { -0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } } +}; + +static union value answers_NEG_INF[] = { + { .f = { 0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { 0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { 0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { 0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + + { .f = { 0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { 0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { 0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { 0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + + { .f = { 0x1.fffff8p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { 0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { 0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { 0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + + { .f = { -0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { -0x1.000000p+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + + { .f = { -0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { -0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { -0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + + { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } } +}; + +static union value answers_POS_INF[] = { + { .f = { 0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { 1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { 1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { 1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + + { .f = { 0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { 0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { 0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { 0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + + { .f = { 0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { 0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { 0x1.000000p+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { 0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + + { .f = { -0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { -0x1.fffff8p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + + { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + + { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { 0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { 0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { 0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } } +}; + +static union value answers_ZERO[] = { + { .f = { 0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { 0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { 0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { 0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + + { .f = { 0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { 0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { 0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { 0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + + { .f = { 0x1.fffff8p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { 0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { 0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { 0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + + { .f = { -0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { -0x1.fffff8p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + + { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + + { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { 0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { 0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }, + { .f = { 0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } } +}; + +union value *answers[] = { + answers_NEAREST_INT, + answers_NEG_INF, + answers_POS_INF, + answers_ZERO, + 0 /* CUR_DIRECTION answers depend on current rounding mode. */ +}; + +#include "sse4_1-round3.h" From patchwork Fri Aug 20 19:27:22 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Paul A. Clarke" X-Patchwork-Id: 1519168 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=QCdGvhYm; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GrsFX4r5Bz9sRf for ; Sat, 21 Aug 2021 05:31:28 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 73AF23853807 for ; Fri, 20 Aug 2021 19:31:26 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 73AF23853807 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1629487886; bh=p35SqRMZIo19ti5VR3lDLN+LQbA/Aw3Yg8t3I6qdbQ8=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=QCdGvhYm/JLWaENEn7rPNYVUY83Bp0z5U1Ai9gQgkjQlbL+Wg7Z0KVA3KkK91XvEN rIZXFKCwEOihheVNBUYt2FcBVbXF5WftC6ffIBgM7g3Iuq3Roz4O9xFDR1hz3C9ly4 fPBStYyRznc88yozXvVsSj7Y63FUtllg+Cu0+gn0= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 2253D3858026 for ; Fri, 20 Aug 2021 19:27:44 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 2253D3858026 Received: from pps.filterd (m0098409.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 17KJ72Xi143944; Fri, 20 Aug 2021 15:27:43 -0400 Received: from ppma03dal.us.ibm.com (b.bd.3ea9.ip4.static.sl-reverse.com [169.62.189.11]) by mx0a-001b2d01.pphosted.com with ESMTP id 3ahjgcma92-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 20 Aug 2021 15:27:43 -0400 Received: from pps.filterd (ppma03dal.us.ibm.com [127.0.0.1]) by ppma03dal.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 17KJ8Xhm020414; Fri, 20 Aug 2021 19:27:42 GMT Received: from b01cxnp23034.gho.pok.ibm.com (b01cxnp23034.gho.pok.ibm.com [9.57.198.29]) by ppma03dal.us.ibm.com with ESMTP id 3ae5fjatrw-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 20 Aug 2021 19:27:42 +0000 Received: from b01ledav005.gho.pok.ibm.com (b01ledav005.gho.pok.ibm.com [9.57.199.110]) by b01cxnp23034.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 17KJRfqe40632632 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 20 Aug 2021 19:27:41 GMT Received: from b01ledav005.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 27059AE067; Fri, 20 Aug 2021 19:27:41 +0000 (GMT) Received: from b01ledav005.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E7B5CAE063; Fri, 20 Aug 2021 19:27:40 +0000 (GMT) Received: from localhost (unknown [9.77.137.12]) by b01ledav005.gho.pok.ibm.com (Postfix) with ESMTP; Fri, 20 Aug 2021 19:27:40 +0000 (GMT) To: gcc-patches@gcc.gnu.org Subject: [PATCH v2 2/6] rs6000: Support SSE4.1 "min" and "max" intrinsics Date: Fri, 20 Aug 2021 14:27:22 -0500 Message-Id: <20210820192726.1575524-3-pc@us.ibm.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20210820192726.1575524-1-pc@us.ibm.com> References: <20210820192726.1575524-1-pc@us.ibm.com> MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: lLB0D7uqZKWrJ9DVcV1tphBXjpwGLFNC X-Proofpoint-GUID: lLB0D7uqZKWrJ9DVcV1tphBXjpwGLFNC X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.790 definitions=2021-08-20_08:2021-08-20, 2021-08-20 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 spamscore=0 bulkscore=0 lowpriorityscore=0 priorityscore=1501 mlxscore=0 phishscore=0 clxscore=1015 impostorscore=0 mlxlogscore=915 malwarescore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2107140000 definitions=main-2108200107 X-Spam-Status: No, score=-13.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: "Paul A. Clarke via Gcc-patches" From: "Paul A. Clarke" Reply-To: "Paul A. Clarke" Cc: wschmidt@linux.ibm.com, segher@kernel.crashing.org Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" Function signatures and decorations match gcc/config/i386/smmintrin.h. Also, copy tests for _mm_min_epi8, _mm_min_epu16, _mm_min_epi32, _mm_min_epu32, _mm_max_epi8, _mm_max_epu16, _mm_max_epi32, _mm_max_epu32 from gcc/testsuite/gcc.target/i386. sse4_1-pmaxsb.c and sse4_1-pminsb.c were modified from using "char" types to "signed char" types, because the default is unsigned on powerpc. 2021-08-20 Paul A. Clarke gcc * config/rs6000/smmintrin.h (_mm_min_epi8, _mm_min_epu16, _mm_min_epi32, _mm_min_epu32, _mm_max_epi8, _mm_max_epu16, _mm_max_epi32, _mm_max_epu32): New. gcc/testsuite * gcc.target/powerpc/sse4_1-pmaxsb.c: Copy from gcc.target/i386. * gcc.target/powerpc/sse4_1-pmaxsd.c: Same. * gcc.target/powerpc/sse4_1-pmaxud.c: Same. * gcc.target/powerpc/sse4_1-pmaxuw.c: Same. * gcc.target/powerpc/sse4_1-pminsb.c: Same. * gcc.target/powerpc/sse4_1-pminsd.c: Same. * gcc.target/powerpc/sse4_1-pminud.c: Same. * gcc.target/powerpc/sse4_1-pminuw.c: Same. --- v2: - Added "extern" to functions to maintain compatible decorations with like implementations in gcc/config/i386. - Removed "-Wno-psabi" from tests as unnecessary, per v1 review. - Noted testing in patch series cover letter. gcc/config/rs6000/smmintrin.h | 56 +++++++++++++++++++ .../gcc.target/powerpc/sse4_1-pmaxsb.c | 46 +++++++++++++++ .../gcc.target/powerpc/sse4_1-pmaxsd.c | 46 +++++++++++++++ .../gcc.target/powerpc/sse4_1-pmaxud.c | 47 ++++++++++++++++ .../gcc.target/powerpc/sse4_1-pmaxuw.c | 47 ++++++++++++++++ .../gcc.target/powerpc/sse4_1-pminsb.c | 46 +++++++++++++++ .../gcc.target/powerpc/sse4_1-pminsd.c | 46 +++++++++++++++ .../gcc.target/powerpc/sse4_1-pminud.c | 47 ++++++++++++++++ .../gcc.target/powerpc/sse4_1-pminuw.c | 47 ++++++++++++++++ 9 files changed, 428 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxsb.c create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxsd.c create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxud.c create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxuw.c create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pminsb.c create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pminsd.c create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pminud.c create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pminuw.c diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h index a6b88d313ad0..505fe4ce22a8 100644 --- a/gcc/config/rs6000/smmintrin.h +++ b/gcc/config/rs6000/smmintrin.h @@ -408,6 +408,62 @@ _mm_test_mix_ones_zeros (__m128i __A, __m128i __mask) return any_ones * any_zeros; } +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_min_epi8 (__m128i __X, __m128i __Y) +{ + return (__m128i) vec_min ((__v16qi)__X, (__v16qi)__Y); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_min_epu16 (__m128i __X, __m128i __Y) +{ + return (__m128i) vec_min ((__v8hu)__X, (__v8hu)__Y); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_min_epi32 (__m128i __X, __m128i __Y) +{ + return (__m128i) vec_min ((__v4si)__X, (__v4si)__Y); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_min_epu32 (__m128i __X, __m128i __Y) +{ + return (__m128i) vec_min ((__v4su)__X, (__v4su)__Y); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_max_epi8 (__m128i __X, __m128i __Y) +{ + return (__m128i) vec_max ((__v16qi)__X, (__v16qi)__Y); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_max_epu16 (__m128i __X, __m128i __Y) +{ + return (__m128i) vec_max ((__v8hu)__X, (__v8hu)__Y); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_max_epi32 (__m128i __X, __m128i __Y) +{ + return (__m128i) vec_max ((__v4si)__X, (__v4si)__Y); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_max_epu32 (__m128i __X, __m128i __Y) +{ + return (__m128i) vec_max ((__v4su)__X, (__v4su)__Y); +} + /* Return horizontal packed word minimum and its index in bits [15:0] and bits [18:16] respectively. */ __inline __m128i diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxsb.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxsb.c new file mode 100644 index 000000000000..7a465b01dd11 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxsb.c @@ -0,0 +1,46 @@ +/* { dg-do run } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-O2 -mvsx" } */ + +#ifndef CHECK_H +#define CHECK_H "sse4_1-check.h" +#endif + +#ifndef TEST +#define TEST sse4_1_test +#endif + +#include CHECK_H + +#include + +#define NUM 1024 + +static void +TEST (void) +{ + union + { + __m128i x[NUM / 16]; + signed char i[NUM]; + } dst, src1, src2; + int i, sign = 1; + signed char max; + + for (i = 0; i < NUM; i++) + { + src1.i[i] = i * i * sign; + src2.i[i] = (i + 20) * sign; + sign = -sign; + } + + for (i = 0; i < NUM; i += 16) + dst.x[i / 16] = _mm_max_epi8 (src1.x[i / 16], src2.x[i / 16]); + + for (i = 0; i < NUM; i++) + { + max = src1.i[i] <= src2.i[i] ? src2.i[i] : src1.i[i]; + if (max != dst.i[i]) + abort (); + } +} diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxsd.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxsd.c new file mode 100644 index 000000000000..d4947e9dae9a --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxsd.c @@ -0,0 +1,46 @@ +/* { dg-do run } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-O2 -mvsx" } */ + +#ifndef CHECK_H +#define CHECK_H "sse4_1-check.h" +#endif + +#ifndef TEST +#define TEST sse4_1_test +#endif + +#include CHECK_H + +#include + +#define NUM 64 + +static void +TEST (void) +{ + union + { + __m128i x[NUM / 4]; + int i[NUM]; + } dst, src1, src2; + int i, sign = 1; + int max; + + for (i = 0; i < NUM; i++) + { + src1.i[i] = i * i * sign; + src2.i[i] = (i + 20) * sign; + sign = -sign; + } + + for (i = 0; i < NUM; i += 4) + dst.x[i / 4] = _mm_max_epi32 (src1.x[i / 4], src2.x[i / 4]); + + for (i = 0; i < NUM; i++) + { + max = src1.i[i] <= src2.i[i] ? src2.i[i] : src1.i[i]; + if (max != dst.i[i]) + abort (); + } +} diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxud.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxud.c new file mode 100644 index 000000000000..1407ebccacd3 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxud.c @@ -0,0 +1,47 @@ +/* { dg-do run } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-O2 -mvsx" } */ + +#ifndef CHECK_H +#define CHECK_H "sse4_1-check.h" +#endif + +#ifndef TEST +#define TEST sse4_1_test +#endif + +#include CHECK_H + +#include + +#define NUM 64 + +static void +TEST (void) +{ + union + { + __m128i x[NUM / 4]; + unsigned int i[NUM]; + } dst, src1, src2; + int i; + unsigned int max; + + for (i = 0; i < NUM; i++) + { + src1.i[i] = i * i; + src2.i[i] = i + 20; + if ((i % 4)) + src2.i[i] |= 0x80000000; + } + + for (i = 0; i < NUM; i += 4) + dst.x[i / 4] = _mm_max_epu32 (src1.x[i / 4], src2.x[i / 4]); + + for (i = 0; i < NUM; i++) + { + max = src1.i[i] <= src2.i[i] ? src2.i[i] : src1.i[i]; + if (max != dst.i[i]) + abort (); + } +} diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxuw.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxuw.c new file mode 100644 index 000000000000..73ead0e90683 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmaxuw.c @@ -0,0 +1,47 @@ +/* { dg-do run } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-O2 -mvsx" } */ + +#ifndef CHECK_H +#define CHECK_H "sse4_1-check.h" +#endif + +#ifndef TEST +#define TEST sse4_1_test +#endif + +#include CHECK_H + +#include + +#define NUM 64 + +static void +TEST (void) +{ + union + { + __m128i x[NUM / 8]; + unsigned short i[NUM]; + } dst, src1, src2; + int i; + unsigned short max; + + for (i = 0; i < NUM; i++) + { + src1.i[i] = i * i; + src2.i[i] = i + 20; + if ((i % 8)) + src2.i[i] |= 0x8000; + } + + for (i = 0; i < NUM; i += 8) + dst.x[i / 8] = _mm_max_epu16 (src1.x[i / 8], src2.x[i / 8]); + + for (i = 0; i < NUM; i++) + { + max = src1.i[i] <= src2.i[i] ? src2.i[i] : src1.i[i]; + if (max != dst.i[i]) + abort (); + } +} diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-pminsb.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-pminsb.c new file mode 100644 index 000000000000..bf491b7d363d --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-pminsb.c @@ -0,0 +1,46 @@ +/* { dg-do run } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-O2 -mvsx" } */ + +#ifndef CHECK_H +#define CHECK_H "sse4_1-check.h" +#endif + +#ifndef TEST +#define TEST sse4_1_test +#endif + +#include CHECK_H + +#include + +#define NUM 1024 + +static void +TEST (void) +{ + union + { + __m128i x[NUM / 16]; + signed char i[NUM]; + } dst, src1, src2; + int i, sign = 1; + signed char min; + + for (i = 0; i < NUM; i++) + { + src1.i[i] = i * i * sign; + src2.i[i] = (i + 20) * sign; + sign = -sign; + } + + for (i = 0; i < NUM; i += 16) + dst.x[i / 16] = _mm_min_epi8 (src1.x[i / 16], src2.x[i / 16]); + + for (i = 0; i < NUM; i++) + { + min = src1.i[i] >= src2.i[i] ? src2.i[i] : src1.i[i]; + if (min != dst.i[i]) + abort (); + } +} diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-pminsd.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-pminsd.c new file mode 100644 index 000000000000..6cb27556a3b0 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-pminsd.c @@ -0,0 +1,46 @@ +/* { dg-do run } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-O2 -mvsx" } */ + +#ifndef CHECK_H +#define CHECK_H "sse4_1-check.h" +#endif + +#ifndef TEST +#define TEST sse4_1_test +#endif + +#include CHECK_H + +#include + +#define NUM 64 + +static void +TEST (void) +{ + union + { + __m128i x[NUM / 4]; + int i[NUM]; + } dst, src1, src2; + int i, sign = 1; + int min; + + for (i = 0; i < NUM; i++) + { + src1.i[i] = i * i * sign; + src2.i[i] = (i + 20) * sign; + sign = -sign; + } + + for (i = 0; i < NUM; i += 4) + dst.x[i / 4] = _mm_min_epi32 (src1.x[i / 4], src2.x[i / 4]); + + for (i = 0; i < NUM; i++) + { + min = src1.i[i] >= src2.i[i] ? src2.i[i] : src1.i[i]; + if (min != dst.i[i]) + abort (); + } +} diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-pminud.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-pminud.c new file mode 100644 index 000000000000..afda4b906599 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-pminud.c @@ -0,0 +1,47 @@ +/* { dg-do run } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-O2 -mvsx" } */ + +#ifndef CHECK_H +#define CHECK_H "sse4_1-check.h" +#endif + +#ifndef TEST +#define TEST sse4_1_test +#endif + +#include CHECK_H + +#include + +#define NUM 64 + +static void +TEST (void) +{ + union + { + __m128i x[NUM / 4]; + unsigned int i[NUM]; + } dst, src1, src2; + int i; + unsigned int min; + + for (i = 0; i < NUM; i++) + { + src1.i[i] = i * i; + src2.i[i] = i + 20; + if ((i % 4)) + src2.i[i] |= 0x80000000; + } + + for (i = 0; i < NUM; i += 4) + dst.x[i / 4] = _mm_min_epu32 (src1.x[i / 4], src2.x[i / 4]); + + for (i = 0; i < NUM; i++) + { + min = src1.i[i] >= src2.i[i] ? src2.i[i] : src1.i[i]; + if (min != dst.i[i]) + abort (); + } +} diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-pminuw.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-pminuw.c new file mode 100644 index 000000000000..25cc115285c6 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-pminuw.c @@ -0,0 +1,47 @@ +/* { dg-do run } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-O2 -mvsx" } */ + +#ifndef CHECK_H +#define CHECK_H "sse4_1-check.h" +#endif + +#ifndef TEST +#define TEST sse4_1_test +#endif + +#include CHECK_H + +#include + +#define NUM 64 + +static void +TEST (void) +{ + union + { + __m128i x[NUM / 8]; + unsigned short i[NUM]; + } dst, src1, src2; + int i; + unsigned short min; + + for (i = 0; i < NUM; i++) + { + src1.i[i] = i * i; + src2.i[i] = i + 20; + if ((i % 8)) + src2.i[i] |= 0x8000; + } + + for (i = 0; i < NUM; i += 8) + dst.x[i / 8] = _mm_min_epu16 (src1.x[i / 8], src2.x[i / 8]); + + for (i = 0; i < NUM; i++) + { + min = src1.i[i] >= src2.i[i] ? src2.i[i] : src1.i[i]; + if (min != dst.i[i]) + abort (); + } +} From patchwork Fri Aug 20 19:27:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Paul A. Clarke" X-Patchwork-Id: 1519165 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=PwOfVemS; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GrsB91TH4z9sRf for ; Sat, 21 Aug 2021 05:28:31 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 42BC938A8812 for ; Fri, 20 Aug 2021 19:28:27 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 42BC938A8812 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1629487707; bh=6yZPEJSlFufkOtkUg1mxL5zecgBLBx5og6IiwY+Gmkk=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=PwOfVemSOkBt+WpM51dAOz4nxThjniLNRSMfqCvsChwi/FwTXfctP6/ZTUPyIexxf jnuR0McSCNznMkHA9/951UApb7n4zmrgExWwOroSR9e2NJpOtAGqL/xQWRkkEoFFE+ Qou0UKFMLyCv/38LC6q3Km4YAZuiW6l82ckshqMI= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 3252B385C414 for ; Fri, 20 Aug 2021 19:27:44 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 3252B385C414 Received: from pps.filterd (m0098419.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 17KJ3JMJ047227; Fri, 20 Aug 2021 15:27:43 -0400 Received: from ppma04dal.us.ibm.com (7a.29.35a9.ip4.static.sl-reverse.com [169.53.41.122]) by mx0b-001b2d01.pphosted.com with ESMTP id 3aj47nxku5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 20 Aug 2021 15:27:43 -0400 Received: from pps.filterd (ppma04dal.us.ibm.com [127.0.0.1]) by ppma04dal.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 17KJRTos012132; Fri, 20 Aug 2021 19:27:43 GMT Received: from b01cxnp23032.gho.pok.ibm.com (b01cxnp23032.gho.pok.ibm.com [9.57.198.27]) by ppma04dal.us.ibm.com with ESMTP id 3ae5fhtvsx-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 20 Aug 2021 19:27:43 +0000 Received: from b01ledav002.gho.pok.ibm.com (b01ledav002.gho.pok.ibm.com [9.57.199.107]) by b01cxnp23032.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 17KJRfxu53871060 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 20 Aug 2021 19:27:42 GMT Received: from b01ledav002.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id DBBCE12405E; Fri, 20 Aug 2021 19:27:41 +0000 (GMT) Received: from b01ledav002.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B52EA124055; Fri, 20 Aug 2021 19:27:41 +0000 (GMT) Received: from localhost (unknown [9.77.137.12]) by b01ledav002.gho.pok.ibm.com (Postfix) with ESMTP; Fri, 20 Aug 2021 19:27:41 +0000 (GMT) To: gcc-patches@gcc.gnu.org Subject: [PATCH v2 3/6] rs6000: Simplify some SSE4.1 "test" intrinsics Date: Fri, 20 Aug 2021 14:27:23 -0500 Message-Id: <20210820192726.1575524-4-pc@us.ibm.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20210820192726.1575524-1-pc@us.ibm.com> References: <20210820192726.1575524-1-pc@us.ibm.com> MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: tWtpRAr8K7LGq4MdpnJjurWG5XeUXZnr X-Proofpoint-ORIG-GUID: tWtpRAr8K7LGq4MdpnJjurWG5XeUXZnr X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.790 definitions=2021-08-20_08:2021-08-20, 2021-08-20 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 adultscore=0 priorityscore=1501 suspectscore=0 mlxscore=0 mlxlogscore=869 lowpriorityscore=0 impostorscore=0 bulkscore=0 spamscore=0 malwarescore=0 clxscore=1015 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2107140000 definitions=main-2108200107 X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: "Paul A. Clarke via Gcc-patches" From: "Paul A. Clarke" Reply-To: "Paul A. Clarke" Cc: wschmidt@linux.ibm.com, segher@kernel.crashing.org Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" Copy some simple redirections from i386 , for: - _mm_test_all_zeros - _mm_test_all_ones - _mm_test_mix_ones_zeros 2021-08-20 Paul A. Clarke gcc * config/rs6000/smmintrin.h (_mm_test_all_zeros, _mm_test_all_ones, _mm_test_mix_ones_zeros): Replace. --- v2: - Removed "-Wno-psabi" from tests as unnecessary, per v1 review. - Noted testing in patch series cover letter. gcc/config/rs6000/smmintrin.h | 30 ++++-------------------------- 1 file changed, 4 insertions(+), 26 deletions(-) diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h index 505fe4ce22a8..363534cb06a2 100644 --- a/gcc/config/rs6000/smmintrin.h +++ b/gcc/config/rs6000/smmintrin.h @@ -379,34 +379,12 @@ _mm_testnzc_si128 (__m128i __A, __m128i __B) return _mm_testz_si128 (__A, __B) == 0 && _mm_testc_si128 (__A, __B) == 0; } -__inline int -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_test_all_zeros (__m128i __A, __m128i __mask) -{ - const __v16qu __zero = {0}; - return vec_all_eq (vec_and ((__v16qu) __A, (__v16qu) __mask), __zero); -} +#define _mm_test_all_zeros(M, V) _mm_testz_si128 ((M), (V)) -__inline int -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_test_all_ones (__m128i __A) -{ - const __v16qu __ones = vec_splats ((unsigned char) 0xff); - return vec_all_eq ((__v16qu) __A, __ones); -} +#define _mm_test_all_ones(V) \ + _mm_testc_si128 ((V), _mm_cmpeq_epi32 ((V), (V))) -__inline int -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_test_mix_ones_zeros (__m128i __A, __m128i __mask) -{ - const __v16qu __zero = {0}; - const __v16qu __Amasked = vec_and ((__v16qu) __A, (__v16qu) __mask); - const int any_ones = vec_any_ne (__Amasked, __zero); - const __v16qu __notA = vec_nor ((__v16qu) __A, (__v16qu) __A); - const __v16qu __notAmasked = vec_and ((__v16qu) __notA, (__v16qu) __mask); - const int any_zeros = vec_any_ne (__notAmasked, __zero); - return any_ones * any_zeros; -} +#define _mm_test_mix_ones_zeros(M, V) _mm_testnzc_si128 ((M), (V)) extern __inline __m128i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) From patchwork Fri Aug 20 19:27:24 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Paul A. Clarke" X-Patchwork-Id: 1519169 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=nZQdxTlO; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GrsGr2RL5z9sRf for ; Sat, 21 Aug 2021 05:32:36 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 0BCD638A882C for ; Fri, 20 Aug 2021 19:32:34 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0BCD638A882C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1629487954; bh=/VX2zoJiXrbDTH5lcjFs4ykZjATS//YBMtVvup27rQg=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=nZQdxTlOHbpd+AM0Au0Uv4bIbTP8G9UJPRiXjiARxhhmCH3DuFAwRv9xB4zsy4olC LE7ZFpvytUd5iuQlAbTGsTlPIZCQ8lyvMKq3ceandv9Wpiiu34jkNLknXO8i6CbGcs Z67C3q5OixzS6Z5Elq6hwctUuzwsEHjayfyoi2d8= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 18B4D3853807 for ; Fri, 20 Aug 2021 19:27:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 18B4D3853807 Received: from pps.filterd (m0098410.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 17KJ6WJE006931; Fri, 20 Aug 2021 15:27:45 -0400 Received: from ppma03wdc.us.ibm.com (ba.79.3fa9.ip4.static.sl-reverse.com [169.63.121.186]) by mx0a-001b2d01.pphosted.com with ESMTP id 3ajetknngk-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 20 Aug 2021 15:27:44 -0400 Received: from pps.filterd (ppma03wdc.us.ibm.com [127.0.0.1]) by ppma03wdc.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 17KJR7TP021545; Fri, 20 Aug 2021 19:27:43 GMT Received: from b01cxnp23033.gho.pok.ibm.com (b01cxnp23033.gho.pok.ibm.com [9.57.198.28]) by ppma03wdc.us.ibm.com with ESMTP id 3ae5fgcnwd-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 20 Aug 2021 19:27:43 +0000 Received: from b01ledav003.gho.pok.ibm.com (b01ledav003.gho.pok.ibm.com [9.57.199.108]) by b01cxnp23033.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 17KJRgqx37749122 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 20 Aug 2021 19:27:42 GMT Received: from b01ledav003.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id BF0F6B2065; Fri, 20 Aug 2021 19:27:42 +0000 (GMT) Received: from b01ledav003.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 81172B2064; Fri, 20 Aug 2021 19:27:42 +0000 (GMT) Received: from localhost (unknown [9.77.137.12]) by b01ledav003.gho.pok.ibm.com (Postfix) with ESMTP; Fri, 20 Aug 2021 19:27:42 +0000 (GMT) To: gcc-patches@gcc.gnu.org Subject: [PATCH v2 4/6] rs6000: Support SSE4.1 "cvt" intrinsics Date: Fri, 20 Aug 2021 14:27:24 -0500 Message-Id: <20210820192726.1575524-5-pc@us.ibm.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20210820192726.1575524-1-pc@us.ibm.com> References: <20210820192726.1575524-1-pc@us.ibm.com> MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: 54EshqXzhi8SuptLpUqnpXiSU34V-rDC X-Proofpoint-GUID: 54EshqXzhi8SuptLpUqnpXiSU34V-rDC X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.790 definitions=2021-08-20_08:2021-08-20, 2021-08-20 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=0 mlxscore=0 priorityscore=1501 phishscore=0 lowpriorityscore=0 spamscore=0 suspectscore=0 malwarescore=0 adultscore=0 bulkscore=0 clxscore=1015 mlxlogscore=696 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2107140000 definitions=main-2108200107 X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: "Paul A. Clarke via Gcc-patches" From: "Paul A. Clarke" Reply-To: "Paul A. Clarke" Cc: wschmidt@linux.ibm.com, segher@kernel.crashing.org Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" Function signatures and decorations match gcc/config/i386/smmintrin.h. Also, copy tests for: - _mm_cvtepi8_epi16, _mm_cvtepi8_epi32, _mm_cvtepi8_epi64 - _mm_cvtepi16_epi32, _mm_cvtepi16_epi64 - _mm_cvtepi32_epi64, - _mm_cvtepu8_epi16, _mm_cvtepu8_epi32, _mm_cvtepu8_epi64 - _mm_cvtepu16_epi32, _mm_cvtepu16_epi64 - _mm_cvtepu32_epi64 from gcc/testsuite/gcc.target/i386. sse4_1-pmovsxbd.c, sse4_1-pmovsxbq.c, and sse4_1-pmovsxbw.c were modified from using "char" types to "signed char" types, because the default is unsigned on powerpc. 2021-08-20 Paul A. Clarke gcc * config/rs6000/smmintrin.h (_mm_cvtepi8_epi16, _mm_cvtepi8_epi32, _mm_cvtepi8_epi64, _mm_cvtepi16_epi32, _mm_cvtepi16_epi64, _mm_cvtepi32_epi64, _mm_cvtepu8_epi16, _mm_cvtepu8_epi32, _mm_cvtepu8_epi64, _mm_cvtepu16_epi32, _mm_cvtepu16_epi64, _mm_cvtepu32_epi64): New. gcc/testsuite * gcc.target/powerpc/sse4_1-pmovsxbd.c: Copy from gcc.target/i386, adjust dg directives to suit. * gcc.target/powerpc/sse4_1-pmovsxbq.c: Same. * gcc.target/powerpc/sse4_1-pmovsxbw.c: Same. * gcc.target/powerpc/sse4_1-pmovsxdq.c: Same. * gcc.target/powerpc/sse4_1-pmovsxwd.c: Same. * gcc.target/powerpc/sse4_1-pmovsxwq.c: Same. * gcc.target/powerpc/sse4_1-pmovzxbd.c: Same. * gcc.target/powerpc/sse4_1-pmovzxbq.c: Same. * gcc.target/powerpc/sse4_1-pmovzxbw.c: Same. * gcc.target/powerpc/sse4_1-pmovzxdq.c: Same. * gcc.target/powerpc/sse4_1-pmovzxwd.c: Same. * gcc.target/powerpc/sse4_1-pmovzxwq.c: Same. --- v2: - Added "extern" to functions to maintain compatible decorations with like implementations in gcc/config/i386. - Removed "-Wno-psabi" from tests as unnecessary, per v1 review. - Noted testing in patch series cover letter. gcc/config/rs6000/smmintrin.h | 138 ++++++++++++++++++ .../gcc.target/powerpc/sse4_1-pmovsxbd.c | 42 ++++++ .../gcc.target/powerpc/sse4_1-pmovsxbq.c | 42 ++++++ .../gcc.target/powerpc/sse4_1-pmovsxbw.c | 42 ++++++ .../gcc.target/powerpc/sse4_1-pmovsxdq.c | 42 ++++++ .../gcc.target/powerpc/sse4_1-pmovsxwd.c | 42 ++++++ .../gcc.target/powerpc/sse4_1-pmovsxwq.c | 42 ++++++ .../gcc.target/powerpc/sse4_1-pmovzxbd.c | 43 ++++++ .../gcc.target/powerpc/sse4_1-pmovzxbq.c | 43 ++++++ .../gcc.target/powerpc/sse4_1-pmovzxbw.c | 43 ++++++ .../gcc.target/powerpc/sse4_1-pmovzxdq.c | 43 ++++++ .../gcc.target/powerpc/sse4_1-pmovzxwd.c | 43 ++++++ .../gcc.target/powerpc/sse4_1-pmovzxwq.c | 43 ++++++ 13 files changed, 648 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbd.c create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbq.c create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbw.c create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxdq.c create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxwd.c create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxwq.c create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxbd.c create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxbq.c create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxbw.c create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxdq.c create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxwd.c create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxwq.c diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h index 363534cb06a2..fdef6674d16c 100644 --- a/gcc/config/rs6000/smmintrin.h +++ b/gcc/config/rs6000/smmintrin.h @@ -442,6 +442,144 @@ _mm_max_epu32 (__m128i __X, __m128i __Y) return (__m128i) vec_max ((__v4su)__X, (__v4su)__Y); } +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtepi8_epi16 (__m128i __A) +{ + return (__m128i) vec_unpackh ((__v16qi)__A); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtepi8_epi32 (__m128i __A) +{ + __A = (__m128i) vec_unpackh ((__v16qi)__A); + return (__m128i) vec_unpackh ((__v8hi)__A); +} + +#ifdef _ARCH_PWR8 +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtepi8_epi64 (__m128i __A) +{ + __A = (__m128i) vec_unpackh ((__v16qi)__A); + __A = (__m128i) vec_unpackh ((__v8hi)__A); + return (__m128i) vec_unpackh ((__v4si)__A); +} +#endif + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtepi16_epi32 (__m128i __A) +{ + return (__m128i) vec_unpackh ((__v8hi)__A); +} + +#ifdef _ARCH_PWR8 +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtepi16_epi64 (__m128i __A) +{ + __A = (__m128i) vec_unpackh ((__v8hi)__A); + return (__m128i) vec_unpackh ((__v4si)__A); +} +#endif + +#ifdef _ARCH_PWR8 +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtepi32_epi64 (__m128i __A) +{ + return (__m128i) vec_unpackh ((__v4si)__A); +} +#endif + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtepu8_epi16 (__m128i __A) +{ + const __v16qu __zero = {0}; +#ifdef __LITTLE_ENDIAN__ + __A = (__m128i) vec_mergeh ((__v16qu)__A, __zero); +#else /* __BIG_ENDIAN__. */ + __A = (__m128i) vec_mergeh (__zero, (__v16qu)__A); +#endif /* __BIG_ENDIAN__. */ + return __A; +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtepu8_epi32 (__m128i __A) +{ + const __v16qu __zero = {0}; +#ifdef __LITTLE_ENDIAN__ + __A = (__m128i) vec_mergeh ((__v16qu)__A, __zero); + __A = (__m128i) vec_mergeh ((__v8hu)__A, (__v8hu)__zero); +#else /* __BIG_ENDIAN__. */ + __A = (__m128i) vec_mergeh (__zero, (__v16qu)__A); + __A = (__m128i) vec_mergeh ((__v8hu)__zero, (__v8hu)__A); +#endif /* __BIG_ENDIAN__. */ + return __A; +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtepu8_epi64 (__m128i __A) +{ + const __v16qu __zero = {0}; +#ifdef __LITTLE_ENDIAN__ + __A = (__m128i) vec_mergeh ((__v16qu)__A, __zero); + __A = (__m128i) vec_mergeh ((__v8hu)__A, (__v8hu)__zero); + __A = (__m128i) vec_mergeh ((__v4su)__A, (__v4su)__zero); +#else /* __BIG_ENDIAN__. */ + __A = (__m128i) vec_mergeh (__zero, (__v16qu)__A); + __A = (__m128i) vec_mergeh ((__v8hu)__zero, (__v8hu)__A); + __A = (__m128i) vec_mergeh ((__v4su)__zero, (__v4su)__A); +#endif /* __BIG_ENDIAN__. */ + return __A; +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtepu16_epi32 (__m128i __A) +{ + const __v8hu __zero = {0}; +#ifdef __LITTLE_ENDIAN__ + __A = (__m128i) vec_mergeh ((__v8hu)__A, __zero); +#else /* __BIG_ENDIAN__. */ + __A = (__m128i) vec_mergeh (__zero, (__v8hu)__A); +#endif /* __BIG_ENDIAN__. */ + return __A; +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtepu16_epi64 (__m128i __A) +{ + const __v8hu __zero = {0}; +#ifdef __LITTLE_ENDIAN__ + __A = (__m128i) vec_mergeh ((__v8hu)__A, __zero); + __A = (__m128i) vec_mergeh ((__v4su)__A, (__v4su)__zero); +#else /* __BIG_ENDIAN__. */ + __A = (__m128i) vec_mergeh (__zero, (__v8hu)__A); + __A = (__m128i) vec_mergeh ((__v4su)__zero, (__v4su)__A); +#endif /* __BIG_ENDIAN__. */ + return __A; +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtepu32_epi64 (__m128i __A) +{ + const __v4su __zero = {0}; +#ifdef __LITTLE_ENDIAN__ + __A = (__m128i) vec_mergeh ((__v4su)__A, __zero); +#else /* __BIG_ENDIAN__. */ + __A = (__m128i) vec_mergeh (__zero, (__v4su)__A); +#endif /* __BIG_ENDIAN__. */ + return __A; +} + /* Return horizontal packed word minimum and its index in bits [15:0] and bits [18:16] respectively. */ __inline __m128i diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbd.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbd.c new file mode 100644 index 000000000000..553c8dd84505 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbd.c @@ -0,0 +1,42 @@ +/* { dg-do run } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-O2 -mvsx" } */ + +#ifndef CHECK_H +#define CHECK_H "sse4_1-check.h" +#endif + +#ifndef TEST +#define TEST sse4_1_test +#endif + +#include CHECK_H + +#include + +#define NUM 128 + +static void +TEST (void) +{ + union + { + __m128i x[NUM / 4]; + int i[NUM]; + signed char c[NUM * 4]; + } dst, src; + int i, sign = 1; + + for (i = 0; i < NUM; i++) + { + src.c[(i % 4) + (i / 4) * 16] = i * i * sign; + sign = -sign; + } + + for (i = 0; i < NUM; i += 4) + dst.x [i / 4] = _mm_cvtepi8_epi32 (src.x [i / 4]); + + for (i = 0; i < NUM; i++) + if (src.c[(i % 4) + (i / 4) * 16] != dst.i[i]) + abort (); +} diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbq.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbq.c new file mode 100644 index 000000000000..9ec1ab7a4169 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbq.c @@ -0,0 +1,42 @@ +/* { dg-do run } */ +/* { dg-require-effective-target p8vector_hw } */ +/* { dg-options "-O2 -mpower8-vector" } */ + +#ifndef CHECK_H +#define CHECK_H "sse4_1-check.h" +#endif + +#ifndef TEST +#define TEST sse4_1_test +#endif + +#include CHECK_H + +#include + +#define NUM 128 + +static void +TEST (void) +{ + union + { + __m128i x[NUM / 2]; + long long ll[NUM]; + signed char c[NUM * 8]; + } dst, src; + int i, sign = 1; + + for (i = 0; i < NUM; i++) + { + src.c[(i % 2) + (i / 2) * 16] = i * i * sign; + sign = -sign; + } + + for (i = 0; i < NUM; i += 2) + dst.x [i / 2] = _mm_cvtepi8_epi64 (src.x [i / 2]); + + for (i = 0; i < NUM; i++) + if (src.c[(i % 2) + (i / 2) * 16] != dst.ll[i]) + abort (); +} diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbw.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbw.c new file mode 100644 index 000000000000..be4cf417ca7e --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbw.c @@ -0,0 +1,42 @@ +/* { dg-do run } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-O2 -mvsx" } */ + +#ifndef CHECK_H +#define CHECK_H "sse4_1-check.h" +#endif + +#ifndef TEST +#define TEST sse4_1_test +#endif + +#include CHECK_H + +#include + +#define NUM 128 + +static void +TEST (void) +{ + union + { + __m128i x[NUM / 8]; + short s[NUM]; + signed char c[NUM * 2]; + } dst, src; + int i, sign = 1; + + for (i = 0; i < NUM; i++) + { + src.c[(i % 8) + (i / 8) * 16] = i * i * sign; + sign = -sign; + } + + for (i = 0; i < NUM; i += 8) + dst.x [i / 8] = _mm_cvtepi8_epi16 (src.x [i / 8]); + + for (i = 0; i < NUM; i++) + if (src.c[(i % 8) + (i / 8) * 16] != dst.s[i]) + abort (); +} diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxdq.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxdq.c new file mode 100644 index 000000000000..1c263782240a --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxdq.c @@ -0,0 +1,42 @@ +/* { dg-do run } */ +/* { dg-require-effective-target p8vector_hw } */ +/* { dg-options "-O2 -mpower8-vector" } */ + +#ifndef CHECK_H +#define CHECK_H "sse4_1-check.h" +#endif + +#ifndef TEST +#define TEST sse4_1_test +#endif + +#include CHECK_H + +#include + +#define NUM 128 + +static void +TEST (void) +{ + union + { + __m128i x[NUM / 2]; + long long ll[NUM]; + int i[NUM * 2]; + } dst, src; + int i, sign = 1; + + for (i = 0; i < NUM; i++) + { + src.i[(i % 2) + (i / 2) * 4] = i * i * sign; + sign = -sign; + } + + for (i = 0; i < NUM; i += 2) + dst.x [i / 2] = _mm_cvtepi32_epi64 (src.x [i / 2]); + + for (i = 0; i < NUM; i++) + if (src.i[(i % 2) + (i / 2) * 4] != dst.ll[i]) + abort (); +} diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxwd.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxwd.c new file mode 100644 index 000000000000..f0f31aba44ba --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxwd.c @@ -0,0 +1,42 @@ +/* { dg-do run } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-O2 -mvsx" } */ + +#ifndef CHECK_H +#define CHECK_H "sse4_1-check.h" +#endif + +#ifndef TEST +#define TEST sse4_1_test +#endif + +#include CHECK_H + +#include + +#define NUM 128 + +static void +TEST (void) +{ + union + { + __m128i x[NUM / 4]; + int i[NUM]; + short s[NUM * 2]; + } dst, src; + int i, sign = 1; + + for (i = 0; i < NUM; i++) + { + src.s[(i % 4) + (i / 4) * 8] = i * i * sign; + sign = -sign; + } + + for (i = 0; i < NUM; i += 4) + dst.x [i / 4] = _mm_cvtepi16_epi32 (src.x [i / 4]); + + for (i = 0; i < NUM; i++) + if (src.s[(i % 4) + (i / 4) * 8] != dst.i[i]) + abort (); +} diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxwq.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxwq.c new file mode 100644 index 000000000000..67864695a113 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxwq.c @@ -0,0 +1,42 @@ +/* { dg-do run } */ +/* { dg-require-effective-target p8vector_hw } */ +/* { dg-options "-O2 -mpower8-vector" } */ + +#ifndef CHECK_H +#define CHECK_H "sse4_1-check.h" +#endif + +#ifndef TEST +#define TEST sse4_1_test +#endif + +#include CHECK_H + +#include + +#define NUM 128 + +static void +TEST (void) +{ + union + { + __m128i x[NUM / 2]; + long long ll[NUM]; + short s[NUM * 4]; + } dst, src; + int i, sign = 1; + + for (i = 0; i < NUM; i++) + { + src.s[(i % 2) + (i / 2) * 8] = i * i * sign; + sign = -sign; + } + + for (i = 0; i < NUM; i += 2) + dst.x [i / 2] = _mm_cvtepi16_epi64 (src.x [i / 2]); + + for (i = 0; i < NUM; i++) + if (src.s[(i % 2) + (i / 2) * 8] != dst.ll[i]) + abort (); +} diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxbd.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxbd.c new file mode 100644 index 000000000000..098ef6a49cb0 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxbd.c @@ -0,0 +1,43 @@ +/* { dg-do run } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-O2 -mvsx" } */ + +#ifndef CHECK_H +#define CHECK_H "sse4_1-check.h" +#endif + +#ifndef TEST +#define TEST sse4_1_test +#endif + +#include CHECK_H + +#include + +#define NUM 128 + +static void +TEST (void) +{ + union + { + __m128i x[NUM / 4]; + unsigned int i[NUM]; + unsigned char c[NUM * 4]; + } dst, src; + int i; + + for (i = 0; i < NUM; i++) + { + src.c[(i % 4) + (i / 4) * 16] = i * i; + if ((i % 4)) + src.c[(i % 4) + (i / 4) * 16] |= 0x80; + } + + for (i = 0; i < NUM; i += 4) + dst.x [i / 4] = _mm_cvtepu8_epi32 (src.x [i / 4]); + + for (i = 0; i < NUM; i++) + if (src.c[(i % 4) + (i / 4) * 16] != dst.i[i]) + abort (); +} diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxbq.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxbq.c new file mode 100644 index 000000000000..7b862767436e --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxbq.c @@ -0,0 +1,43 @@ +/* { dg-do run } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-O2 -mvsx" } */ + +#ifndef CHECK_H +#define CHECK_H "sse4_1-check.h" +#endif + +#ifndef TEST +#define TEST sse4_1_test +#endif + +#include CHECK_H + +#include + +#define NUM 128 + +static void +TEST (void) +{ + union + { + __m128i x[NUM / 2]; + unsigned long long ll[NUM]; + unsigned char c[NUM * 8]; + } dst, src; + int i; + + for (i = 0; i < NUM; i++) + { + src.c[(i % 2) + (i / 2) * 16] = i * i; + if ((i % 2)) + src.c[(i % 2) + (i / 2) * 16] |= 0x80; + } + + for (i = 0; i < NUM; i += 2) + dst.x [i / 2] = _mm_cvtepu8_epi64 (src.x [i / 2]); + + for (i = 0; i < NUM; i++) + if (src.c[(i % 2) + (i / 2) * 16] != dst.ll[i]) + abort (); +} diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxbw.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxbw.c new file mode 100644 index 000000000000..9fdbec342d46 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxbw.c @@ -0,0 +1,43 @@ +/* { dg-do run } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-O2 -mvsx" } */ + +#ifndef CHECK_H +#define CHECK_H "sse4_1-check.h" +#endif + +#ifndef TEST +#define TEST sse4_1_test +#endif + +#include CHECK_H + +#include + +#define NUM 128 + +static void +TEST (void) +{ + union + { + __m128i x[NUM / 8]; + unsigned short s[NUM]; + unsigned char c[NUM * 2]; + } dst, src; + int i; + + for (i = 0; i < NUM; i++) + { + src.c[(i % 8) + (i / 8) * 16] = i * i; + if ((i % 4)) + src.c[(i % 8) + (i / 8) * 16] |= 0x80; + } + + for (i = 0; i < NUM; i += 8) + dst.x [i / 8] = _mm_cvtepu8_epi16 (src.x [i / 8]); + + for (i = 0; i < NUM; i++) + if (src.c[(i % 8) + (i / 8) * 16] != dst.s[i]) + abort (); +} diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxdq.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxdq.c new file mode 100644 index 000000000000..7a5e7688d9f5 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxdq.c @@ -0,0 +1,43 @@ +/* { dg-do run } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-O2 -mvsx" } */ + +#ifndef CHECK_H +#define CHECK_H "sse4_1-check.h" +#endif + +#ifndef TEST +#define TEST sse4_1_test +#endif + +#include CHECK_H + +#include + +#define NUM 128 + +static void +TEST (void) +{ + union + { + __m128i x[NUM / 2]; + unsigned long long ll[NUM]; + unsigned int i[NUM * 2]; + } dst, src; + int i; + + for (i = 0; i < NUM; i++) + { + src.i[(i % 2) + (i / 2) * 4] = i * i; + if ((i % 2)) + src.i[(i % 2) + (i / 2) * 4] |= 0x80000000; + } + + for (i = 0; i < NUM; i += 2) + dst.x [i / 2] = _mm_cvtepu32_epi64 (src.x [i / 2]); + + for (i = 0; i < NUM; i++) + if (src.i[(i % 2) + (i / 2) * 4] != dst.ll[i]) + abort (); +} diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxwd.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxwd.c new file mode 100644 index 000000000000..078a5a45d909 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxwd.c @@ -0,0 +1,43 @@ +/* { dg-do run } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-O2 -mvsx" } */ + +#ifndef CHECK_H +#define CHECK_H "sse4_1-check.h" +#endif + +#ifndef TEST +#define TEST sse4_1_test +#endif + +#include CHECK_H + +#include + +#define NUM 128 + +static void +TEST (void) +{ + union + { + __m128i x[NUM / 4]; + unsigned int i[NUM]; + unsigned short s[NUM * 2]; + } dst, src; + int i; + + for (i = 0; i < NUM; i++) + { + src.s[(i % 4) + (i / 4) * 8] = i * i; + if ((i % 4)) + src.s[(i % 4) + (i / 4) * 8] |= 0x8000; + } + + for (i = 0; i < NUM; i += 4) + dst.x [i / 4] = _mm_cvtepu16_epi32 (src.x [i / 4]); + + for (i = 0; i < NUM; i++) + if (src.s[(i % 4) + (i / 4) * 8] != dst.i[i]) + abort (); +} diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxwq.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxwq.c new file mode 100644 index 000000000000..120d00290faa --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxwq.c @@ -0,0 +1,43 @@ +/* { dg-do run } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-O2 -mvsx" } */ + +#ifndef CHECK_H +#define CHECK_H "sse4_1-check.h" +#endif + +#ifndef TEST +#define TEST sse4_1_test +#endif + +#include CHECK_H + +#include + +#define NUM 128 + +static void +TEST (void) +{ + union + { + __m128i x[NUM / 2]; + unsigned long long ll[NUM]; + unsigned short s[NUM * 4]; + } dst, src; + int i; + + for (i = 0; i < NUM; i++) + { + src.s[(i % 2) + (i / 2) * 8] = i * i; + if ((i % 2)) + src.s[(i % 2) + (i / 2) * 8] |= 0x8000; + } + + for (i = 0; i < NUM; i += 2) + dst.x [i / 2] = _mm_cvtepu16_epi64 (src.x [i / 2]); + + for (i = 0; i < NUM; i++) + if (src.s[(i % 2) + (i / 2) * 8] != dst.ll[i]) + abort (); +} From patchwork Fri Aug 20 19:27:25 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Paul A. Clarke" X-Patchwork-Id: 1519166 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=NiIMZ7Dn; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GrsCG3YYDz9sRf for ; Sat, 21 Aug 2021 05:29:30 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 539993853808 for ; Fri, 20 Aug 2021 19:29:27 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 539993853808 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1629487767; bh=OdPIjO2Rj2OwqFZ6gdhB0ZAvlpEGG3DOd/cy+aOhDlU=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=NiIMZ7DniPkQC2Jofv775B9HUbuvsSKJZ0okwvszClcYkL3goB75EsBZJmmfcKKZh JfzgjxKeHDipuPkajVMMyZ6AcQxf3peRdeSORdPYQA5lygQxc/LyIKXh0Nj6yaxVTv hsOdu95aPiOLWXSZ908pmffR1x741Pcx/YFyHVkw= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id D3751385C414 for ; Fri, 20 Aug 2021 19:27:45 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org D3751385C414 Received: from pps.filterd (m0098413.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 17KJK0HX184459; Fri, 20 Aug 2021 15:27:45 -0400 Received: from ppma02dal.us.ibm.com (a.bd.3ea9.ip4.static.sl-reverse.com [169.62.189.10]) by mx0b-001b2d01.pphosted.com with ESMTP id 3ahvmspw13-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 20 Aug 2021 15:27:45 -0400 Received: from pps.filterd (ppma02dal.us.ibm.com [127.0.0.1]) by ppma02dal.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 17KJ8bcm016409; Fri, 20 Aug 2021 19:27:44 GMT Received: from b01cxnp22034.gho.pok.ibm.com (b01cxnp22034.gho.pok.ibm.com [9.57.198.24]) by ppma02dal.us.ibm.com with ESMTP id 3aeey103ve-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 20 Aug 2021 19:27:44 +0000 Received: from b01ledav005.gho.pok.ibm.com (b01ledav005.gho.pok.ibm.com [9.57.199.110]) by b01cxnp22034.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 17KJRhkW30081300 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 20 Aug 2021 19:27:43 GMT Received: from b01ledav005.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A7F22AE062; Fri, 20 Aug 2021 19:27:43 +0000 (GMT) Received: from b01ledav005.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 66A89AE060; Fri, 20 Aug 2021 19:27:43 +0000 (GMT) Received: from localhost (unknown [9.77.137.12]) by b01ledav005.gho.pok.ibm.com (Postfix) with ESMTP; Fri, 20 Aug 2021 19:27:43 +0000 (GMT) To: gcc-patches@gcc.gnu.org Subject: [PATCH v2 5/6] rs6000: Support more SSE4.1 "cmp", "mul", "pack" intrinsics Date: Fri, 20 Aug 2021 14:27:25 -0500 Message-Id: <20210820192726.1575524-6-pc@us.ibm.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20210820192726.1575524-1-pc@us.ibm.com> References: <20210820192726.1575524-1-pc@us.ibm.com> MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: oJwaaCR1eYTBrzZ7LEHLr1Bqk8dSwqcQ X-Proofpoint-GUID: oJwaaCR1eYTBrzZ7LEHLr1Bqk8dSwqcQ X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.790 definitions=2021-08-20_08:2021-08-20, 2021-08-20 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 adultscore=0 phishscore=0 bulkscore=0 mlxscore=0 mlxlogscore=915 spamscore=0 impostorscore=0 malwarescore=0 priorityscore=1501 clxscore=1015 suspectscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2107140000 definitions=main-2108200107 X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: "Paul A. Clarke via Gcc-patches" From: "Paul A. Clarke" Reply-To: "Paul A. Clarke" Cc: wschmidt@linux.ibm.com, segher@kernel.crashing.org Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" Function signatures and decorations match gcc/config/i386/smmintrin.h. Also, copy tests for: - _mm_cmpeq_epi64, _mm_cmpgt_epi64 - _mm_mullo_epi32, _mm_mul_epi32 - _mm_packus_epi32 from gcc/testsuite/gcc.target/i386. 2021-08-20 Paul A. Clarke gcc * config/rs6000/smmintrin.h (_mm_cmpeq_epi64, _mm_cmpgt_epi64, _mm_mullo_epi32, _mm_mul_epi32, _mm_packus_epi32): New. gcc/testsuite * gcc.target/powerpc/pr78102.c: Copy from gcc.target/i386, adjust dg directives to suit. * gcc.target/powerpc/sse4_1-packusdw.c: Same. * gcc.target/powerpc/sse4_1-pcmpeqq.c: Same. * gcc.target/powerpc/sse4_1-pmuldq.c: Same. * gcc.target/powerpc/sse4_1-pmulld.c: Same. * gcc.target/powerpc/sse4_2-pcmpgtq.c: Same. --- v2: - Added "extern" to functions to maintain compatible decorations with like implementations in gcc/config/i386. - Removed "-Wno-psabi" from tests as unnecessary, per v1 review. - Noted testing in patch series cover letter. gcc/config/rs6000/smmintrin.h | 41 +++++++++++ gcc/testsuite/gcc.target/powerpc/pr78102.c | 23 ++++++ .../gcc.target/powerpc/sse4_1-packusdw.c | 73 +++++++++++++++++++ .../gcc.target/powerpc/sse4_1-pcmpeqq.c | 46 ++++++++++++ .../gcc.target/powerpc/sse4_1-pmuldq.c | 51 +++++++++++++ .../gcc.target/powerpc/sse4_1-pmulld.c | 46 ++++++++++++ .../gcc.target/powerpc/sse4_2-pcmpgtq.c | 46 ++++++++++++ 7 files changed, 326 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/pr78102.c create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-packusdw.c create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pcmpeqq.c create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmuldq.c create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmulld.c create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_2-pcmpgtq.c diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h index fdef6674d16c..c04d2bb5b6d3 100644 --- a/gcc/config/rs6000/smmintrin.h +++ b/gcc/config/rs6000/smmintrin.h @@ -386,6 +386,15 @@ _mm_testnzc_si128 (__m128i __A, __m128i __B) #define _mm_test_mix_ones_zeros(M, V) _mm_testnzc_si128 ((M), (V)) +#ifdef _ARCH_PWR8 +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpeq_epi64 (__m128i __X, __m128i __Y) +{ + return (__m128i) vec_cmpeq ((__v2di)__X, (__v2di)__Y); +} +#endif + extern __inline __m128i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm_min_epi8 (__m128i __X, __m128i __Y) @@ -444,6 +453,22 @@ _mm_max_epu32 (__m128i __X, __m128i __Y) extern __inline __m128i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mullo_epi32 (__m128i __X, __m128i __Y) +{ + return (__m128i) vec_mul ((__v4su)__X, (__v4su)__Y); +} + +#ifdef _ARCH_PWR8 +__inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mul_epi32 (__m128i __X, __m128i __Y) +{ + return (__m128i) vec_mule ((__v4si)__X, (__v4si)__Y); +} +#endif + +__inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm_cvtepi8_epi16 (__m128i __A) { return (__m128i) vec_unpackh ((__v16qi)__A); @@ -607,4 +632,20 @@ _mm_minpos_epu16 (__m128i __A) return __r.__m; } +__inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_packus_epi32 (__m128i __X, __m128i __Y) +{ + return (__m128i) vec_packsu ((__v4si)__X, (__v4si)__Y); +} + +#ifdef _ARCH_PWR8 +__inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpgt_epi64 (__m128i __X, __m128i __Y) +{ + return (__m128i) vec_cmpgt ((__v2di)__X, (__v2di)__Y); +} +#endif + #endif diff --git a/gcc/testsuite/gcc.target/powerpc/pr78102.c b/gcc/testsuite/gcc.target/powerpc/pr78102.c new file mode 100644 index 000000000000..56a2d497bbff --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr78102.c @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mvsx" } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ + +#include + +__m128i +foo (const __m128i x, const __m128i y) +{ + return _mm_cmpeq_epi64 (x, y); +} + +__v2di +bar (const __v2di x, const __v2di y) +{ + return x == y; +} + +__v2di +baz (const __v2di x, const __v2di y) +{ + return x != y; +} diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-packusdw.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-packusdw.c new file mode 100644 index 000000000000..15b8ca418f54 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-packusdw.c @@ -0,0 +1,73 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mvsx" } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ + +#ifndef CHECK_H +#define CHECK_H "sse4_1-check.h" +#endif + +#ifndef TEST +#define TEST sse4_1_test +#endif + +#include CHECK_H + +#include + +#define NUM 64 + +static unsigned short +int_to_ushort (int iVal) +{ + unsigned short sVal; + + if (iVal < 0) + sVal = 0; + else if (iVal > 0xffff) + sVal = 0xffff; + else sVal = iVal; + + return sVal; +} + +static void +TEST (void) +{ + union + { + __m128i x[NUM / 4]; + int i[NUM]; + } src1, src2; + union + { + __m128i x[NUM / 4]; + unsigned short s[NUM * 2]; + } dst; + int i, sign = 1; + + for (i = 0; i < NUM; i++) + { + src1.i[i] = i * i * sign; + src2.i[i] = (i + 20) * sign; + sign = -sign; + } + + for (i = 0; i < NUM; i += 4) + dst.x[i / 4] = _mm_packus_epi32 (src1.x [i / 4], src2.x [i / 4]); + + for (i = 0; i < NUM; i ++) + { + int dstIndex; + unsigned short sVal; + + sVal = int_to_ushort (src1.i[i]); + dstIndex = (i % 4) + (i / 4) * 8; + if (sVal != dst.s[dstIndex]) + abort (); + + sVal = int_to_ushort (src2.i[i]); + dstIndex += 4; + if (sVal != dst.s[dstIndex]) + abort (); + } +} diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-pcmpeqq.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-pcmpeqq.c new file mode 100644 index 000000000000..39b9f01d64a4 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-pcmpeqq.c @@ -0,0 +1,46 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mpower8-vector" } */ +/* { dg-require-effective-target p8vector_hw } */ + +#ifndef CHECK_H +#define CHECK_H "sse4_1-check.h" +#endif + +#ifndef TEST +#define TEST sse4_1_test +#endif + +#include CHECK_H + +#include + +#define NUM 64 + +static void +TEST (void) +{ + union + { + __m128i x[NUM / 2]; + long long ll[NUM]; + } dst, src1, src2; + int i, sign=1; + long long is_eq; + + for (i = 0; i < NUM; i++) + { + src1.ll[i] = i * i * sign; + src2.ll[i] = (i + 20) * sign; + sign = -sign; + } + + for (i = 0; i < NUM; i += 2) + dst.x [i / 2] = _mm_cmpeq_epi64(src1.x [i / 2], src2.x [i / 2]); + + for (i = 0; i < NUM; i++) + { + is_eq = src1.ll[i] == src2.ll[i] ? 0xffffffffffffffffLL : 0LL; + if (is_eq != dst.ll[i]) + abort (); + } +} diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-pmuldq.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmuldq.c new file mode 100644 index 000000000000..6a884f46235f --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmuldq.c @@ -0,0 +1,51 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mpower8-vector" } */ +/* { dg-require-effective-target p8vector_hw } */ + +#ifndef CHECK_H +#define CHECK_H "sse4_1-check.h" +#endif + +#ifndef TEST +#define TEST sse4_1_test +#endif + +#include CHECK_H + +#include + +#define NUM 64 + +static void +TEST (void) +{ + union + { + __m128i x[NUM / 2]; + long long ll[NUM]; + } dst; + union + { + __m128i x[NUM / 2]; + int i[NUM * 2]; + } src1, src2; + int i, sign = 1; + long long value; + + for (i = 0; i < NUM * 2; i += 2) + { + src1.i[i] = i * i * sign; + src2.i[i] = (i + 20) * sign; + sign = -sign; + } + + for (i = 0; i < NUM; i += 2) + dst.x[i / 2] = _mm_mul_epi32 (src1.x[i / 2], src2.x[i / 2]); + + for (i = 0; i < NUM; i++) + { + value = (long long) src1.i[i * 2] * (long long) src2.i[i * 2]; + if (value != dst.ll[i]) + abort (); + } +} diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-pmulld.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmulld.c new file mode 100644 index 000000000000..150832915911 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmulld.c @@ -0,0 +1,46 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mvsx" } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ + +#ifndef CHECK_H +#define CHECK_H "sse4_1-check.h" +#endif + +#ifndef TEST +#define TEST sse4_1_test +#endif + +#include CHECK_H + +#include + +#define NUM 64 + +static void +TEST (void) +{ + union + { + __m128i x[NUM / 4]; + int i[NUM]; + } dst, src1, src2; + int i, sign = 1; + int value; + + for (i = 0; i < NUM; i++) + { + src1.i[i] = i * i * sign; + src2.i[i] = (i + 20) * sign; + sign = -sign; + } + + for (i = 0; i < NUM; i += 4) + dst.x[i / 4] = _mm_mullo_epi32 (src1.x[i / 4], src2.x[i / 4]); + + for (i = 0; i < NUM; i++) + { + value = src1.i[i] * src2.i[i]; + if (value != dst.i[i]) + abort (); + } +} diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_2-pcmpgtq.c b/gcc/testsuite/gcc.target/powerpc/sse4_2-pcmpgtq.c new file mode 100644 index 000000000000..4bfbad885b30 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/sse4_2-pcmpgtq.c @@ -0,0 +1,46 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mvsx" } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ + +#ifndef CHECK_H +#define CHECK_H "sse4_2-check.h" +#endif + +#ifndef TEST +#define TEST sse4_2_test +#endif + +#include CHECK_H + +#include + +#define NUM 64 + +static void +TEST (void) +{ + union + { + __m128i x[NUM / 2]; + long long ll[NUM]; + } dst, src1, src2; + int i, sign = 1; + long long is_eq; + + for (i = 0; i < NUM; i++) + { + src1.ll[i] = i * i * sign; + src2.ll[i] = (i + 20) * sign; + sign = -sign; + } + + for (i = 0; i < NUM; i += 2) + dst.x[i / 2] = _mm_cmpgt_epi64 (src1.x[i / 2], src2.x[i / 2]); + + for (i = 0; i < NUM; i++) + { + is_eq = src1.ll[i] > src2.ll[i] ? 0xFFFFFFFFFFFFFFFFLL : 0LL; + if (is_eq != dst.ll[i]) + abort (); + } +} From patchwork Fri Aug 20 19:27:26 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Paul A. Clarke" X-Patchwork-Id: 1519171 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=KQ5j/Xy+; dkim-atps=neutral Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GrsK91sMkz9sRf for ; Sat, 21 Aug 2021 05:34:36 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6A0FD3851C2C for ; Fri, 20 Aug 2021 19:34:34 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 6A0FD3851C2C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1629488074; bh=OTjU16WZpkUV0Kye1X1qNsTN8QfDgHGxqj+ThD0WM1k=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=KQ5j/Xy+HsXi9bi+yjQ6m0BuZOlWvl8mCU83CvcMINiic5QWq6iPfo7tEgTGjCTSJ KxM8JZZGvWg8Otgr0E4aT7SW5ElD2Li31OiUqi1+ItqSOFMGpWdpQUKZNlqcA8ewL1 5GLnKbpXLTetYVRGE+6Q2v/+baFxCsdBABF6NUco= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 6D3C23851C2C for ; Fri, 20 Aug 2021 19:27:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 6D3C23851C2C Received: from pps.filterd (m0098417.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 17KJ32kL060292; Fri, 20 Aug 2021 15:27:46 -0400 Received: from ppma01wdc.us.ibm.com (fd.55.37a9.ip4.static.sl-reverse.com [169.55.85.253]) by mx0a-001b2d01.pphosted.com with ESMTP id 3aj068atmp-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 20 Aug 2021 15:27:45 -0400 Received: from pps.filterd (ppma01wdc.us.ibm.com [127.0.0.1]) by ppma01wdc.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 17KJ8Xl4021596; Fri, 20 Aug 2021 19:27:45 GMT Received: from b01cxnp22034.gho.pok.ibm.com (b01cxnp22034.gho.pok.ibm.com [9.57.198.24]) by ppma01wdc.us.ibm.com with ESMTP id 3ae5ffvnjs-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 20 Aug 2021 19:27:45 +0000 Received: from b01ledav004.gho.pok.ibm.com (b01ledav004.gho.pok.ibm.com [9.57.199.109]) by b01cxnp22034.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 17KJRiQf44368254 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 20 Aug 2021 19:27:44 GMT Received: from b01ledav004.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 85DB3112069; Fri, 20 Aug 2021 19:27:44 +0000 (GMT) Received: from b01ledav004.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4BE2F112065; Fri, 20 Aug 2021 19:27:44 +0000 (GMT) Received: from localhost (unknown [9.77.137.12]) by b01ledav004.gho.pok.ibm.com (Postfix) with ESMTP; Fri, 20 Aug 2021 19:27:44 +0000 (GMT) To: gcc-patches@gcc.gnu.org Subject: [PATCH v2 6/6] rs6000: Guard some x86 intrinsics implementations Date: Fri, 20 Aug 2021 14:27:26 -0500 Message-Id: <20210820192726.1575524-7-pc@us.ibm.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20210820192726.1575524-1-pc@us.ibm.com> References: <20210820192726.1575524-1-pc@us.ibm.com> MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: 2nXneQL7sCS8l5gmFRCYoBb0KtFGJ-zz X-Proofpoint-ORIG-GUID: 2nXneQL7sCS8l5gmFRCYoBb0KtFGJ-zz X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.790 definitions=2021-08-20_06:2021-08-20, 2021-08-20 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 lowpriorityscore=0 mlxlogscore=999 mlxscore=0 adultscore=0 priorityscore=1501 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 malwarescore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2107140000 definitions=main-2108200107 X-Spam-Status: No, score=-13.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: "Paul A. Clarke via Gcc-patches" From: "Paul A. Clarke" Reply-To: "Paul A. Clarke" Cc: wschmidt@linux.ibm.com, segher@kernel.crashing.org Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" Some compatibility implementations of x86 intrinsics include Power intrinsics which require POWER8. Guard them. emmintrin.h: - _mm_cmpord_pd: Remove code which was ostensibly for pre-POWER8, but which indeed depended on POWER8 (vec_cmpgt(v2du)/vcmpgtud). The "POWER8" version works fine on pre-POWER8. - _mm_mul_epu32: vec_mule(v4su) uses vmuleuw. pmmintrin.h: - _mm_movehdup_ps: vec_mergeo(v4su) uses vmrgow. - _mm_moveldup_ps: vec_mergee(v4su) uses vmrgew. smmintrin.h: - _mm_cmpeq_epi64: vec_cmpeq(v2di) uses vcmpequd. - _mm_mul_epi32: vec_mule(v4si) uses vmuluwm. - _mm_cmpgt_epi64: vec_cmpgt(v2di) uses vcmpgtsd. tmmintrin.h: - _mm_sign_epi8: vec_neg(v4si) uses vsububm. - _mm_sign_epi16: vec_neg(v4si) uses vsubuhm. - _mm_sign_epi32: vec_neg(v4si) uses vsubuwm. Note that the above three could actually be supported pre-POWER8, but current GCC does not support them before POWER8. - _mm_sign_pi8: depends on _mm_sign_epi8. - _mm_sign_pi16: depends on _mm_sign_epi16. - _mm_sign_pi32: depends on _mm_sign_epi32. 2021-08-20 Paul A. Clarke gcc PR target/101893 * config/rs6000/emmintrin.h: Guard POWER8 intrinsics. * config/rs6000/pmmintrin.h: Same. * config/rs6000/smmintrin.h: Same. * config/rs6000/tmmintrin.h: Same. --- v2: - Ensured that new "#ifdef _ARCH_PWR8" bracket each function so impacted, rather than groups of functions, per v1 review. - Noted testing in patch series cover letter. - Added PR number to commit message. gcc/config/rs6000/emmintrin.h | 12 ++---------- gcc/config/rs6000/pmmintrin.h | 4 ++++ gcc/config/rs6000/smmintrin.h | 4 ++++ gcc/config/rs6000/tmmintrin.h | 12 ++++++++++++ 4 files changed, 22 insertions(+), 10 deletions(-) diff --git a/gcc/config/rs6000/emmintrin.h b/gcc/config/rs6000/emmintrin.h index ce1287edf782..32ad72b4cc35 100644 --- a/gcc/config/rs6000/emmintrin.h +++ b/gcc/config/rs6000/emmintrin.h @@ -430,20 +430,10 @@ _mm_cmpnge_pd (__m128d __A, __m128d __B) extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_cmpord_pd (__m128d __A, __m128d __B) { -#if _ARCH_PWR8 __v2du c, d; /* Compare against self will return false (0's) if NAN. */ c = (__v2du)vec_cmpeq (__A, __A); d = (__v2du)vec_cmpeq (__B, __B); -#else - __v2du a, b; - __v2du c, d; - const __v2du double_exp_mask = {0x7ff0000000000000, 0x7ff0000000000000}; - a = (__v2du)vec_abs ((__v2df)__A); - b = (__v2du)vec_abs ((__v2df)__B); - c = (__v2du)vec_cmpgt (double_exp_mask, a); - d = (__v2du)vec_cmpgt (double_exp_mask, b); -#endif /* A != NAN and B != NAN. */ return ((__m128d)vec_and(c, d)); } @@ -1472,6 +1462,7 @@ _mm_mul_su32 (__m64 __A, __m64 __B) return ((__m64)a * (__m64)b); } +#ifdef _ARCH_PWR8 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_mul_epu32 (__m128i __A, __m128i __B) { @@ -1498,6 +1489,7 @@ _mm_mul_epu32 (__m128i __A, __m128i __B) return (__m128i) vec_mule ((__v4su)__A, (__v4su)__B); #endif } +#endif extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_slli_epi16 (__m128i __A, int __B) diff --git a/gcc/config/rs6000/pmmintrin.h b/gcc/config/rs6000/pmmintrin.h index eab712fdfa66..83dff1d85666 100644 --- a/gcc/config/rs6000/pmmintrin.h +++ b/gcc/config/rs6000/pmmintrin.h @@ -123,17 +123,21 @@ _mm_hsub_pd (__m128d __X, __m128d __Y) vec_mergel ((__v2df) __X, (__v2df)__Y)); } +#ifdef _ARCH_PWR8 extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_movehdup_ps (__m128 __X) { return (__m128)vec_mergeo ((__v4su)__X, (__v4su)__X); } +#endif +#ifdef _ARCH_PWR8 extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_moveldup_ps (__m128 __X) { return (__m128)vec_mergee ((__v4su)__X, (__v4su)__X); } +#endif extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_loaddup_pd (double const *__P) diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h index c04d2bb5b6d3..29719367e205 100644 --- a/gcc/config/rs6000/smmintrin.h +++ b/gcc/config/rs6000/smmintrin.h @@ -272,6 +272,7 @@ _mm_extract_ps (__m128 __X, const int __N) return ((__v4si)__X)[__N & 3]; } +#ifdef _ARCH_PWR8 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_blend_epi16 (__m128i __A, __m128i __B, const int __imm8) { @@ -283,6 +284,7 @@ _mm_blend_epi16 (__m128i __A, __m128i __B, const int __imm8) #endif return (__m128i) vec_sel ((__v8hu) __A, (__v8hu) __B, __shortmask); } +#endif extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_blendv_epi8 (__m128i __A, __m128i __B, __m128i __mask) @@ -343,6 +345,7 @@ _mm_blend_pd (__m128d __A, __m128d __B, const int __imm8) return (__m128d) __r; } +#ifdef _ARCH_PWR8 __inline __m128d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm_blendv_pd (__m128d __A, __m128d __B, __m128d __mask) @@ -351,6 +354,7 @@ _mm_blendv_pd (__m128d __A, __m128d __B, __m128d __mask) const __vector __bool long long __boolmask = vec_cmplt ((__v2di) __mask, __zero); return (__m128d) vec_sel ((__v2du) __A, (__v2du) __B, (__v2du) __boolmask); } +#endif __inline int __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) diff --git a/gcc/config/rs6000/tmmintrin.h b/gcc/config/rs6000/tmmintrin.h index 971511260b78..a67d88c8079a 100644 --- a/gcc/config/rs6000/tmmintrin.h +++ b/gcc/config/rs6000/tmmintrin.h @@ -350,6 +350,7 @@ _mm_shuffle_pi8 (__m64 __A, __m64 __B) return (__m64) ((__v2du) (__C))[0]; } +#ifdef _ARCH_PWR8 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_sign_epi8 (__m128i __A, __m128i __B) @@ -361,7 +362,9 @@ _mm_sign_epi8 (__m128i __A, __m128i __B) __v16qi __conv = vec_add (__selectneg, __selectpos); return (__m128i) vec_mul ((__v16qi) __A, (__v16qi) __conv); } +#endif +#ifdef _ARCH_PWR8 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_sign_epi16 (__m128i __A, __m128i __B) @@ -373,7 +376,9 @@ _mm_sign_epi16 (__m128i __A, __m128i __B) __v8hi __conv = vec_add (__selectneg, __selectpos); return (__m128i) vec_mul ((__v8hi) __A, (__v8hi) __conv); } +#endif +#ifdef _ARCH_PWR8 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_sign_epi32 (__m128i __A, __m128i __B) @@ -385,7 +390,9 @@ _mm_sign_epi32 (__m128i __A, __m128i __B) __v4si __conv = vec_add (__selectneg, __selectpos); return (__m128i) vec_mul ((__v4si) __A, (__v4si) __conv); } +#endif +#ifdef _ARCH_PWR8 extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_sign_pi8 (__m64 __A, __m64 __B) @@ -396,7 +403,9 @@ _mm_sign_pi8 (__m64 __A, __m64 __B) __C = (__v16qi) _mm_sign_epi8 ((__m128i) __C, (__m128i) __D); return (__m64) ((__v2du) (__C))[0]; } +#endif +#ifdef _ARCH_PWR8 extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_sign_pi16 (__m64 __A, __m64 __B) @@ -407,7 +416,9 @@ _mm_sign_pi16 (__m64 __A, __m64 __B) __C = (__v8hi) _mm_sign_epi16 ((__m128i) __C, (__m128i) __D); return (__m64) ((__v2du) (__C))[0]; } +#endif +#ifdef _ARCH_PWR8 extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_sign_pi32 (__m64 __A, __m64 __B) @@ -418,6 +429,7 @@ _mm_sign_pi32 (__m64 __A, __m64 __B) __C = (__v4si) _mm_sign_epi32 ((__m128i) __C, (__m128i) __D); return (__m64) ((__v2du) (__C))[0]; } +#endif extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))