From patchwork Tue Oct 2 14:11:57 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Paul A. Clarke" X-Patchwork-Id: 977867 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-486790-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=us.ibm.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="dXx1Vl6X"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 42Ph1Y13ymz9t15 for ; Wed, 3 Oct 2018 00:12:32 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to:cc :from:subject:date:message-id:content-type :content-transfer-encoding:mime-version; q=dns; s=default; b=oc8 lPj1DAOEx6vgcA0P+jHjRCvRUpKFdXFWGhx/DIEeW+eOZiNduesnmjAPkMMdDLnx ArLbYamNKFQ6ko8krgzpdt56E+/ARqgNtVO/WjN44tYbhrB0JbWvazVsCQJ1tPaS /9CxpLbmgLA70TsJij8OsBMgksvGI6OSzi71vI/A= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to:cc :from:subject:date:message-id:content-type :content-transfer-encoding:mime-version; s=default; bh=anon1qNpr 6aFfPNgStx45CQVqk8=; b=dXx1Vl6XhIq7YR5uSLFQK1Ijj+2ut5z27tTJUpkIP 9jWkw4TJQgaYTOvcagD2njY6FNqEhOER0NOy3vF8LwYlvpUMhK/CjYQnjguSOcJP /v95rk99KLEMy6iMV+Q22MDdXYZ7a+NolBHXz0ulXXeLhCZXjGBiYNN44UU7rppF 0Y= Received: (qmail 22443 invoked by alias); 2 Oct 2018 14:12:09 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 22356 invoked by uid 89); 2 Oct 2018 14:12:08 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-10.8 required=5.0 tests=BAYES_00, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.2 spammy=occasional, permissions, POWER, horizontal X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0a-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.156.1) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 02 Oct 2018 14:12:05 +0000 Received: from pps.filterd (m0098396.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w92E9QtO018376 for ; Tue, 2 Oct 2018 10:12:03 -0400 Received: from e33.co.us.ibm.com (e33.co.us.ibm.com [32.97.110.151]) by mx0a-001b2d01.pphosted.com with ESMTP id 2mv9ue18g3-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Tue, 02 Oct 2018 10:12:03 -0400 Received: from localhost by e33.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 2 Oct 2018 08:12:02 -0600 Received: from b03cxnp08025.gho.boulder.ibm.com (9.17.130.17) by e33.co.us.ibm.com (192.168.1.133) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Tue, 2 Oct 2018 08:12:00 -0600 Received: from b03ledav006.gho.boulder.ibm.com (b03ledav006.gho.boulder.ibm.com [9.17.130.237]) by b03cxnp08025.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id w92EBxoi19333288 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Tue, 2 Oct 2018 07:11:59 -0700 Received: from b03ledav006.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 88298C6061; Tue, 2 Oct 2018 08:11:59 -0600 (MDT) Received: from b03ledav006.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id EAF55C605A; Tue, 2 Oct 2018 08:11:58 -0600 (MDT) Received: from oc3272150783.ibm.com (unknown [9.85.134.13]) by b03ledav006.gho.boulder.ibm.com (Postfix) with ESMTPS; Tue, 2 Oct 2018 08:11:58 -0600 (MDT) To: gcc-patches@gcc.gnu.org Cc: Segher Boessenkool From: Paul Clarke Subject: [PATCH, rs6000] 1/2 Add x86 SSE3 intrinsics to GCC PPC64LE target Date: Tue, 2 Oct 2018 09:11:57 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 x-cbid: 18100214-0036-0000-0000-00000A42C4C3 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00009809; HX=3.00000242; KW=3.00000007; PH=3.00000004; SC=3.00000267; SDB=6.01096813; UDB=6.00567173; IPR=6.00876856; MB=3.00023590; MTD=3.00000008; XFM=3.00000015; UTC=2018-10-02 14:12:01 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18100214-0037-0000-0000-000049248030 Message-Id: MIME-Version: 1.0 This is a follow-on to earlier commits for adding compatibility implementations of x86 intrinsics for PPC64LE. This is the first of two patches. This patch adds 11 of the 13 x86 intrinsics from ("SSE3"). (Patch 2/2 adds tests for these intrinsics, and briefly describes the tests performed.) Implementations are relatively straightforward, with occasional extra effort for vector element ordering. Not implemented are _mm_wait and _mm_monitor, as there are no direct or trivial analogs in the POWER ISA. ./gcc/ChangeLog: 2018-10-02 Paul A. Clarke * config.gcc (powerpc*-*-*): Add pmmintrin.h. * config/rs6000/pmmintrin.h: New file. Index: gcc/config/rs6000/pmmintrin.h =================================================================== --- gcc/config/rs6000/pmmintrin.h (nonexistent) +++ gcc/config/rs6000/pmmintrin.h (working copy) @@ -0,0 +1,177 @@ +/* Copyright (C) 2003-2018 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +/* Implemented from the specification included in the Intel C++ Compiler + User Guide and Reference, version 9.0. */ + +#ifndef NO_WARN_X86_INTRINSICS +/* This header is distributed to simplify porting x86_64 code that + makes explicit use of Intel intrinsics to powerpc64le. + It is the user's responsibility to determine if the results are + acceptable and make additional changes as necessary. + Note that much code that uses Intel intrinsics can be rewritten in + standard C or GNU C extensions, which are more portable and better + optimized across multiple targets. + + In the specific case of X86 SSE3 intrinsics, the PowerPC VMX/VSX ISA + is a good match for most SIMD operations. However the Horizontal + add/sub requires the data pairs be permuted into a separate + registers with vertical even/odd alignment for the operation. + And the addsub operation requires the sign of only the even numbered + elements be flipped (xored with -0.0). + For larger blocks of code using these intrinsic implementations, + the compiler be should be able to schedule instructions to avoid + additional latency. + + In the specific case of the monitor and mwait instructions there are + no direct equivalent in the PowerISA at this time. So those + intrinsics are not implemented. */ +#error "Please read comment above. Use -DNO_WARN_X86_INTRINSICS to disable this warning." +#endif + +#ifndef _PMMINTRIN_H_INCLUDED +#define _PMMINTRIN_H_INCLUDED + +/* We need definitions from the SSE2 and SSE header files*/ +#include + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_addsub_ps (__m128 __X, __m128 __Y) +{ + const __v4sf even_n0 = {-0.0, 0.0, -0.0, 0.0}; + __v4sf even_neg_Y = vec_xor(__Y, even_n0); + return (__m128) vec_add (__X, even_neg_Y); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_addsub_pd (__m128d __X, __m128d __Y) +{ + const __v2df even_n0 = {-0.0, 0.0}; + __v2df even_neg_Y = vec_xor(__Y, even_n0); + return (__m128d) vec_add (__X, even_neg_Y); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_hadd_ps (__m128 __X, __m128 __Y) +{ + __vector unsigned char xform2 = { + #ifdef __LITTLE_ENDIAN__ + 0x00, 0x01, 0x02, 0x03, 0x08, 0x09, 0x0A, 0x0B, 0x10, 0x11, 0x12, 0x13, 0x18, 0x19, 0x1A, 0x1B + #elif __BIG_ENDIAN__ + 0x14, 0x15, 0x16, 0x17, 0x1C, 0x1D, 0x1E, 0x1F, 0x04, 0x05, 0x06, 0x07, 0x0C, 0x0D, 0x0E, 0x0F + #endif + }; + __vector unsigned char xform1 = { + #ifdef __LITTLE_ENDIAN__ + 0x04, 0x05, 0x06, 0x07, 0x0C, 0x0D, 0x0E, 0x0F, 0x14, 0x15, 0x16, 0x17, 0x1C, 0x1D, 0x1E, 0x1F + #elif __BIG_ENDIAN__ + 0x10, 0x11, 0x12, 0x13, 0x18, 0x19, 0x1A, 0x1B, 0x00, 0x01, 0x02, 0x03, 0x08, 0x09, 0x0A, 0x0B + #endif + }; + return (__m128) vec_add (vec_perm ((__v4sf) __X, (__v4sf) __Y, xform2), + vec_perm ((__v4sf) __X, (__v4sf) __Y, xform1)); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_hsub_ps (__m128 __X, __m128 __Y) +{ + __vector unsigned char xform2 = { + #ifdef __LITTLE_ENDIAN__ + 0x00, 0x01, 0x02, 0x03, 0x08, 0x09, 0x0A, 0x0B, 0x10, 0x11, 0x12, 0x13, 0x18, 0x19, 0x1A, 0x1B + #elif __BIG_ENDIAN__ + 0x14, 0x15, 0x16, 0x17, 0x1C, 0x1D, 0x1E, 0x1F, 0x04, 0x05, 0x06, 0x07, 0x0C, 0x0D, 0x0E, 0x0F + #endif + }; + __vector unsigned char xform1 = { + #ifdef __LITTLE_ENDIAN__ + 0x04, 0x05, 0x06, 0x07, 0x0C, 0x0D, 0x0E, 0x0F, 0x14, 0x15, 0x16, 0x17, 0x1C, 0x1D, 0x1E, 0x1F + #elif __BIG_ENDIAN__ + 0x10, 0x11, 0x12, 0x13, 0x18, 0x19, 0x1A, 0x1B, 0x00, 0x01, 0x02, 0x03, 0x08, 0x09, 0x0A, 0x0B + #endif + }; + return (__m128) vec_sub (vec_perm ((__v4sf) __X, (__v4sf) __Y, xform2), + vec_perm ((__v4sf) __X, (__v4sf) __Y, xform1)); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_hadd_pd (__m128d __X, __m128d __Y) +{ + return (__m128d) vec_add (vec_mergeh ((__v2df) __X, (__v2df)__Y), + vec_mergel ((__v2df) __X, (__v2df)__Y)); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_hsub_pd (__m128d __X, __m128d __Y) +{ + return (__m128d) vec_sub (vec_mergeh ((__v2df) __X, (__v2df)__Y), + vec_mergel ((__v2df) __X, (__v2df)__Y)); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_movehdup_ps (__m128 __X) +{ + return (__m128)vec_mergeo ((__v4su)__X, (__v4su)__X); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_moveldup_ps (__m128 __X) +{ + return (__m128)vec_mergee ((__v4su)__X, (__v4su)__X); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_loaddup_pd (double const *__P) +{ + return (__m128d) vec_splats (*__P); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_movedup_pd (__m128d __X) +{ + return _mm_shuffle_pd (__X, __X, _MM_SHUFFLE2 (0,0)); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_lddqu_si128 (__m128i const *__P) +{ + return (__m128i) (vec_vsx_ld(0, (signed int const *)__P)); +} + +#if 0 +/* POWER8 / POWER9 have no equivalent. */ + +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_monitor (void const * __P, unsigned int __E, unsigned int __H) +{ + __builtin_ia32_monitor (__P, __E, __H); +#error "_mm_monitor is not supported on this architecture." +} + +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mwait (unsigned int __E, unsigned int __H) +{ +#error "_mm_mwait is not supported on this architecture." +} +#endif + +#endif /* _PMMINTRIN_H_INCLUDED */ Index: gcc/config.gcc =================================================================== --- gcc/config.gcc (revision 264782) +++ gcc/config.gcc (working copy) @@ -484,6 +484,7 @@ extra_headers="${extra_headers} bmi2intrin.h bmiintrin.h" extra_headers="${extra_headers} xmmintrin.h mm_malloc.h emmintrin.h" extra_headers="${extra_headers} mmintrin.h x86intrin.h" + extra_headers="${extra_headers} pmmintrin.h" extra_headers="${extra_headers} ppu_intrinsics.h spu2vmx.h vec_types.h si2vmx.h" extra_headers="${extra_headers} amo.h" case x$with_cpu in