From patchwork Tue Oct 2 14:12:07 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Paul A. Clarke" X-Patchwork-Id: 977868 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-486791-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=us.ibm.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="tszAqu2b"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 42Ph1r35lxz9tjx for ; Wed, 3 Oct 2018 00:12:48 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to:cc :from:subject:date:mime-version:message-id:content-type :content-transfer-encoding; q=dns; s=default; b=DMXSip6re99pNxFI AI2zBCQqSxlc9pmAtFQEuviBN8XiIZ3ZjBdnCZnZLqHXSZEv7G4isgna3IG7zFIJ 39CbCckB88Abaj9iavh1/+x+dxnv+9SJ0HdzkZC07mOU9PM6Y1yHeqTk28jYfk5k r0TO6j6xUXNKhYl4Xh3xg+P3KjY= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to:cc :from:subject:date:mime-version:message-id:content-type :content-transfer-encoding; s=default; bh=cngBRJGYZZftnfjqaDQ0w/ OK2A0=; b=tszAqu2bNVL7b2o6sQvSoY5jdkQI0oetrZd4dHVEeBcjysj97Odi5U zOOLxW3mQc2vQ97qh1cKwSaV8tAo3GpcYdu0YA9Wveklb5r4DRpZ3Yu33+hU23Lb i2UGgnnS2H6S4rwROU7ya2WTD5HWkojoVgZdU09Ltv3vHiPVjs1W0= Received: (qmail 24487 invoked by alias); 2 Oct 2018 14:12:23 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 24408 invoked by uid 89); 2 Oct 2018 14:12:22 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-8.3 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, KAM_LAZY_DOMAIN_SECURITY, RCVD_IN_DNSWL_LOW, UNWANTED_LANGUAGE_BODY autolearn=ham version=3.3.2 spammy=0.1, 0.3, ptrdiff_t, pix X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0b-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.158.5) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 02 Oct 2018 14:12:16 +0000 Received: from pps.filterd (m0098414.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w92E9GJe121898 for ; Tue, 2 Oct 2018 10:12:12 -0400 Received: from e31.co.us.ibm.com (e31.co.us.ibm.com [32.97.110.149]) by mx0b-001b2d01.pphosted.com with ESMTP id 2mva0trs7a-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Tue, 02 Oct 2018 10:12:12 -0400 Received: from localhost by e31.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 2 Oct 2018 08:12:11 -0600 Received: from b03cxnp08026.gho.boulder.ibm.com (9.17.130.18) by e31.co.us.ibm.com (192.168.1.131) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Tue, 2 Oct 2018 08:12:10 -0600 Received: from b03ledav006.gho.boulder.ibm.com (b03ledav006.gho.boulder.ibm.com [9.17.130.237]) by b03cxnp08026.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id w92EC9Dk35782746 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Tue, 2 Oct 2018 07:12:09 -0700 Received: from b03ledav006.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 39DCFC605A; Tue, 2 Oct 2018 08:12:09 -0600 (MDT) Received: from b03ledav006.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 61F04C605D; Tue, 2 Oct 2018 08:12:08 -0600 (MDT) Received: from oc3272150783.ibm.com (unknown [9.85.134.13]) by b03ledav006.gho.boulder.ibm.com (Postfix) with ESMTPS; Tue, 2 Oct 2018 08:12:08 -0600 (MDT) To: gcc-patches@gcc.gnu.org Cc: Segher Boessenkool From: Paul Clarke Subject: [PATCH, rs6000] 2/2 Add x86 SSE3 intrinsics to GCC PPC64LE target Date: Tue, 2 Oct 2018 09:12:07 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 x-cbid: 18100214-8235-0000-0000-00000E0B3FD9 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00009809; HX=3.00000242; KW=3.00000007; PH=3.00000004; SC=3.00000267; SDB=6.01096814; UDB=6.00567173; IPR=6.00876856; MB=3.00023590; MTD=3.00000008; XFM=3.00000015; UTC=2018-10-02 14:12:11 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18100214-8236-0000-0000-000042D95AC0 Message-Id: This is part 2/2 for contributing PPC64LE support for X86 SSE3 instrisics. This patch includes testsuite/gcc.target tests for the intrinsics defined in pmmintrin.h. Tested on POWER8 ppc64le and ppc64 (-m64 and -m32, the latter only reporting 10 new unsupported tests.) [gcc/testsuite] 2018-10-01 Paul A. Clarke * sse3-check.h: New file. * sse3-addsubps.h: New file. * sse3-addsubpd.h: New file. * sse3-haddps.h: New file. * sse3-hsubps.h: New file. * sse3-haddpd.h: New file. * sse3-hsubpd.h: New file. * sse3-lddqu.h: New file. * sse3-movsldup.h: New file. * sse3-movshdup.h: New file. * sse3-movddup.h: New file. Index: gcc/testsuite/gcc.target/powerpc/pr37191.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/pr37191.c (nonexistent) +++ gcc/testsuite/gcc.target/powerpc/pr37191.c (working copy) @@ -0,0 +1,49 @@ +/* { dg-do compile } */ +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */ +/* { dg-options "-O3 -mdirect-move" } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-require-effective-target p8vector_hw } */ + +#define NO_WARN_X86_INTRINSICS 1 + +#include +#include +#include + +//extern +const uint64_t ff_bone; + +static inline void transpose4x4(uint8_t *dst, uint8_t *src, ptrdiff_t dst_stride, ptrdiff_t src_stride) { + __m64 row0 = _mm_cvtsi32_si64(*(unsigned*)(src + (0 * src_stride))); + __m64 row1 = _mm_cvtsi32_si64(*(unsigned*)(src + (1 * src_stride))); + __m64 row2 = _mm_cvtsi32_si64(*(unsigned*)(src + (2 * src_stride))); + __m64 row3 = _mm_cvtsi32_si64(*(unsigned*)(src + (3 * src_stride))); + __m64 tmp0 = _mm_unpacklo_pi8(row0, row1); + __m64 tmp1 = _mm_unpacklo_pi8(row2, row3); + __m64 row01 = _mm_unpacklo_pi16(tmp0, tmp1); + __m64 row23 = _mm_unpackhi_pi16(tmp0, tmp1); + *((unsigned*)(dst + (0 * dst_stride))) = _mm_cvtsi64_si32(row01); + *((unsigned*)(dst + (1 * dst_stride))) = _mm_cvtsi64_si32(_mm_unpackhi_pi32(row01, row01)); + *((unsigned*)(dst + (2 * dst_stride))) = _mm_cvtsi64_si32(row23); + *((unsigned*)(dst + (3 * dst_stride))) = _mm_cvtsi64_si32(_mm_unpackhi_pi32(row23, row23)); +} +#if 0 +static inline void h264_loop_filter_chroma_intra_mmx2(uint8_t *pix, int stride, int alpha1, int beta1) +{ + asm volatile( + "" + :: "r"(pix-2*stride), "r"(pix), "r"((long)stride), + "m"(alpha1), "m"(beta1), "m"(ff_bone) + ); +} + +#endif +void h264_h_loop_filter_chroma_intra_mmx2(uint8_t *pix, int stride, int alpha, int beta) +{ + uint8_t trans[8*4] __attribute__ ((aligned (8))); + transpose4x4(trans, pix-2, 8, stride); + transpose4x4(trans+4, pix-2+4*stride, 8, stride); +// h264_loop_filter_chroma_intra_mmx2(trans+2*8, 8, alpha-1, beta-1); + transpose4x4(pix-2, trans, stride, 8); + transpose4x4(pix-2+4*stride, trans+4, stride, 8); +} Index: gcc/testsuite/gcc.target/powerpc/sse3-addsubpd.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/sse3-addsubpd.c (nonexistent) +++ gcc/testsuite/gcc.target/powerpc/sse3-addsubpd.c (working copy) @@ -0,0 +1,102 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -mpower8-vector -Wno-psabi" } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-require-effective-target p8vector_hw } */ + +#ifndef CHECK_H +#define CHECK_H "sse3-check.h" +#endif + +#include CHECK_H + +#ifndef TEST +#define TEST sse3_test_addsubpd_1 +#endif + +#define NO_WARN_X86_INTRINSICS 1 +#include + +static void +sse3_test_addsubpd (double *i1, double *i2, double *r) +{ + __m128d t1 = _mm_loadu_pd (i1); + __m128d t2 = _mm_loadu_pd (i2); + + t1 = _mm_addsub_pd (t1, t2); + + _mm_storeu_pd (r, t1); +} + +static void +sse3_test_addsubpd_subsume (double *i1, double *i2, double *r) +{ + __m128d t1 = _mm_load_pd (i1); + __m128d t2 = _mm_load_pd (i2); + + t1 = _mm_addsub_pd (t1, t2); + + _mm_storeu_pd (r, t1); +} + +static int +chk_pd (double *v1, double *v2) +{ + int i; + int n_fails = 0; + + for (i = 0; i < 2; i++) + if (v1[i] != v2[i]) + n_fails += 1; + + return n_fails; +} + +static double p1[2] __attribute__ ((aligned(16))); +static double p2[2] __attribute__ ((aligned(16))); +static double p3[2]; +static double ck[2]; + +double vals[] = + { + 100.0, 200.0, 300.0, 400.0, 5.0, -1.0, .345, -21.5, + 1100.0, 0.235, 321.3, 53.40, 0.3, 10.0, 42.0, 32.52, + 32.6, 123.3, 1.234, 2.156, 0.1, 3.25, 4.75, 32.44, + 12.16, 52.34, 64.12, 71.13, -.1, 2.30, 5.12, 3.785, + 541.3, 321.4, 231.4, 531.4, 71., 321., 231., -531., + 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, + 23.45, -1.43, -6.74, 6.345, -20.1, -20.1, -40.1, -40.1, + 1.234, 2.345, 3.456, 4.567, 5.678, 6.789, 7.891, 8.912, + -9.32, -8.41, -7.50, -6.59, -5.68, -4.77, -3.86, -2.95, + 9.32, 8.41, 7.50, 6.59, -5.68, -4.77, -3.86, -2.95 + }; + +static +void +TEST (void) +{ + int i; + int fail = 0; + + for (i = 0; i < sizeof (vals) / sizeof (vals[0]); i += 4) + { + p1[0] = vals[i+0]; + p1[1] = vals[i+1]; + + p2[0] = vals[i+2]; + p2[1] = vals[i+3]; + + ck[0] = p1[0] - p2[0]; + ck[1] = p1[1] + p2[1]; + + sse3_test_addsubpd (p1, p2, p3); + + fail += chk_pd (ck, p3); + + sse3_test_addsubpd_subsume (p1, p2, p3); + + fail += chk_pd (ck, p3); + } + + if (fail != 0) + abort (); +} Index: gcc/testsuite/gcc.target/powerpc/sse3-addsubps.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/sse3-addsubps.c (nonexistent) +++ gcc/testsuite/gcc.target/powerpc/sse3-addsubps.c (working copy) @@ -0,0 +1,108 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -mpower8-vector -Wno-psabi" } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-require-effective-target p8vector_hw } */ + +#define NO_WARN_X86_INTRINSICS 1 +#ifndef CHECK_H +#define CHECK_H "sse3-check.h" +#endif + +#include CHECK_H + +#ifndef TEST +#define TEST sse3_test_addsubps_1 +#endif + +#include + +static void +sse3_test_addsubps (float *i1, float *i2, float *r) +{ + __m128 t1 = _mm_loadu_ps (i1); + __m128 t2 = _mm_loadu_ps (i2); + + t1 = _mm_addsub_ps (t1, t2); + + _mm_storeu_ps (r, t1); +} + +static void +sse3_test_addsubps_subsume (float *i1, float *i2, float *r) +{ + __m128 t1 = _mm_load_ps (i1); + __m128 t2 = _mm_load_ps (i2); + + t1 = _mm_addsub_ps (t1, t2); + + _mm_storeu_ps (r, t1); +} + +static int +chk_ps (float *v1, float *v2) +{ + int i; + int n_fails = 0; + + for (i = 0; i < 4; i++) + if (v1[i] != v2[i]) + n_fails += 1; + + return n_fails; +} + +static float p1[4] __attribute__ ((aligned(16))); +static float p2[4] __attribute__ ((aligned(16))); +static float p3[4]; +static float ck[4]; + +static float vals[] = + { + 100.0, 200.0, 300.0, 400.0, 5.0, -1.0, .345, -21.5, + 1100.0, 0.235, 321.3, 53.40, 0.3, 10.0, 42.0, 32.52, + 32.6, 123.3, 1.234, 2.156, 0.1, 3.25, 4.75, 32.44, + 12.16, 52.34, 64.12, 71.13, -.1, 2.30, 5.12, 3.785, + 541.3, 321.4, 231.4, 531.4, 71., 321., 231., -531., + 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, + 23.45, -1.43, -6.74, 6.345, -20.1, -20.1, -40.1, -40.1, + 1.234, 2.345, 3.456, 4.567, 5.678, 6.789, 7.891, 8.912, + -9.32, -8.41, -7.50, -6.59, -5.68, -4.77, -3.86, -2.95, + 9.32, 8.41, 7.50, 6.59, -5.68, -4.77, -3.86, -2.95 + }; + +//static +void +TEST (void) +{ + int i; + int fail = 0; + + for (i = 0; i < sizeof (vals) / sizeof (vals); i += 8) + { + p1[0] = vals[i+0]; + p1[1] = vals[i+1]; + p1[2] = vals[i+2]; + p1[3] = vals[i+3]; + + p2[0] = vals[i+4]; + p2[1] = vals[i+5]; + p2[2] = vals[i+6]; + p2[3] = vals[i+7]; + + ck[0] = p1[0] - p2[0]; + ck[1] = p1[1] + p2[1]; + ck[2] = p1[2] - p2[2]; + ck[3] = p1[3] + p2[3]; + + sse3_test_addsubps (p1, p2, p3); + + fail += chk_ps (ck, p3); + + sse3_test_addsubps_subsume (p1, p2, p3); + + fail += chk_ps (ck, p3); + } + + if (fail != 0) + abort (); +} Index: gcc/testsuite/gcc.target/powerpc/sse3-check.h =================================================================== --- gcc/testsuite/gcc.target/powerpc/sse3-check.h (nonexistent) +++ gcc/testsuite/gcc.target/powerpc/sse3-check.h (working copy) @@ -0,0 +1,43 @@ +#include +#include + +#include "m128-check.h" + +/* define DEBUG replace abort with printf on error. */ +//#define DEBUG 1 + +#define TEST sse3_test + +static void sse3_test (void); + +static void +__attribute__ ((noinline)) +do_test (void) +{ + sse3_test (); +} + +int +main () +{ +#ifdef __BUILTIN_CPU_SUPPORTS__ + /* Most SSE intrinsic operations can be implemented via VMX + instructions, but some operations may be faster / simpler + using the POWER8 VSX instructions. This is especially true + when we are transferring / converting to / from __m64 types. + The direct register transfer instructions from POWER8 are + especially important. So we test for arch_2_07. */ + if (__builtin_cpu_supports ("arch_2_07")) + { + do_test (); +#ifdef DEBUG + printf ("PASSED\n"); +#endif + } +#ifdef DEBUG + else + printf ("SKIPPED\n"); +#endif +#endif /* __BUILTIN_CPU_SUPPORTS__ */ + return 0; +} Index: gcc/testsuite/gcc.target/powerpc/sse3-haddpd.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/sse3-haddpd.c (nonexistent) +++ gcc/testsuite/gcc.target/powerpc/sse3-haddpd.c (working copy) @@ -0,0 +1,100 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -mpower8-vector -Wno-psabi" } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-require-effective-target p8vector_hw } */ + +#define NO_WARN_X86_INTRINSICS 1 +#ifndef CHECK_H +#define CHECK_H "sse3-check.h" +#endif + +#include CHECK_H + +#ifndef TEST +#define TEST sse3_test_haddpd_1 +#endif +#include + +static void +sse3_test_haddpd (double *i1, double *i2, double *r) +{ + __m128d t1 = _mm_loadu_pd (i1); + __m128d t2 = _mm_loadu_pd (i2); + + t1 = _mm_hadd_pd (t1, t2); + + _mm_storeu_pd (r, t1); +} + +static void +sse3_test_haddpd_subsume (double *i1, double *i2, double *r) +{ + __m128d t1 = _mm_load_pd (i1); + __m128d t2 = _mm_load_pd (i2); + + t1 = _mm_hadd_pd (t1, t2); + + _mm_storeu_pd (r, t1); +} + +static int +chk_pd (double *v1, double *v2) +{ + int i; + int n_fails = 0; + + for (i = 0; i < 2; i++) + if (v1[i] != v2[i]) + n_fails += 1; + + return n_fails; +} + +static double p1[2] __attribute__ ((aligned(16))); +static double p2[2] __attribute__ ((aligned(16))); +static double p3[2]; +static double ck[2]; + +static double vals[] = + { + 100.0, 200.0, 300.0, 400.0, 5.0, -1.0, .345, -21.5, + 1100.0, 0.235, 321.3, 53.40, 0.3, 10.0, 42.0, 32.52, + 32.6, 123.3, 1.234, 2.156, 0.1, 3.25, 4.75, 32.44, + 12.16, 52.34, 64.12, 71.13, -.1, 2.30, 5.12, 3.785, + 541.3, 321.4, 231.4, 531.4, 71., 321., 231., -531., + 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, + 23.45, -1.43, -6.74, 6.345, -20.1, -20.1, -40.1, -40.1, + 1.234, 2.345, 3.456, 4.567, 5.678, 6.789, 7.891, 8.912, + -9.32, -8.41, -7.50, -6.59, -5.68, -4.77, -3.86, -2.95, + 9.32, 8.41, 7.50, 6.59, -5.68, -4.77, -3.86, -2.95 + }; + +//static +void TEST (void) +{ + int i; + int fail = 0; + + for (i = 0; i < sizeof (vals) / sizeof (vals[0]); i += 4) + { + p1[0] = vals[i + 0]; + p1[1] = vals[i + 1]; + + p2[0] = vals[i + 2]; + p2[1] = vals[i + 3]; + + ck[0] = p1[0] + p1[1]; + ck[1] = p2[0] + p2[1]; + + sse3_test_haddpd (p1, p2, p3); + + fail += chk_pd (ck, p3); + + sse3_test_haddpd_subsume (p1, p2, p3); + + fail += chk_pd (ck, p3); + } + + if (fail != 0) + abort (); +} Index: gcc/testsuite/gcc.target/powerpc/sse3-haddps.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/sse3-haddps.c (nonexistent) +++ gcc/testsuite/gcc.target/powerpc/sse3-haddps.c (working copy) @@ -0,0 +1,108 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -mpower8-vector -Wno-psabi" } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-require-effective-target p8vector_hw } */ + +#ifndef CHECK_H +#define CHECK_H "sse3-check.h" +#endif + +#include CHECK_H + +#ifndef TEST +#define TEST sse3_test_haddps_1 +#endif + +#define NO_WARN_X86_INTRINSICS 1 +#include + +static void +sse3_test_haddps (float *i1, float *i2, float *r) +{ + __m128 t1 = _mm_loadu_ps (i1); + __m128 t2 = _mm_loadu_ps (i2); + + t1 = _mm_hadd_ps (t1, t2); + + _mm_storeu_ps (r, t1); +} + +static void +sse3_test_haddps_subsume (float *i1, float *i2, float *r) +{ + __m128 t1 = _mm_load_ps (i1); + __m128 t2 = _mm_load_ps (i2); + + t1 = _mm_hadd_ps (t1, t2); + + _mm_storeu_ps (r, t1); +} + +static int +chk_ps(float *v1, float *v2) +{ + int i; + int n_fails = 0; + + for (i = 0; i < 4; i++) + if (v1[i] != v2[i]) + n_fails += 1; + + return n_fails; +} + +static float p1[4] __attribute__ ((aligned(16))); +static float p2[4] __attribute__ ((aligned(16))); +static float p3[4]; +static float ck[4]; + +static float vals[] = + { + 100.0, 200.0, 300.0, 400.0, 5.0, -1.0, .345, -21.5, + 1100.0, 0.235, 321.3, 53.40, 0.3, 10.0, 42.0, 32.52, + 32.6, 123.3, 1.234, 2.156, 0.1, 3.25, 4.75, 32.44, + 12.16, 52.34, 64.12, 71.13, -.1, 2.30, 5.12, 3.785, + 541.3, 321.4, 231.4, 531.4, 71., 321., 231., -531., + 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, + 23.45, -1.43, -6.74, 6.345, -20.1, -20.1, -40.1, -40.1, + 1.234, 2.345, 3.456, 4.567, 5.678, 6.789, 7.891, 8.912, + -9.32, -8.41, -7.50, -6.59, -5.68, -4.77, -3.86, -2.95, + 9.32, 8.41, 7.50, 6.59, -5.68, -4.77, -3.86, -2.95 + }; + +//static +void +TEST () +{ + int i; + int fail = 0; + + for (i = 0; i < sizeof (vals) / sizeof (vals[0]); i += 8) + { + p1[0] = vals[i+0]; + p1[1] = vals[i+1]; + p1[2] = vals[i+2]; + p1[3] = vals[i+3]; + + p2[0] = vals[i+4]; + p2[1] = vals[i+5]; + p2[2] = vals[i+6]; + p2[3] = vals[i+7]; + + ck[0] = p1[0] + p1[1]; + ck[1] = p1[2] + p1[3]; + ck[2] = p2[0] + p2[1]; + ck[3] = p2[2] + p2[3]; + + sse3_test_haddps (p1, p2, p3); + + fail += chk_ps (ck, p3); + + sse3_test_haddps_subsume (p1, p2, p3); + + fail += chk_ps (ck, p3); + } + + if (fail != 0) + abort (); +} Index: gcc/testsuite/gcc.target/powerpc/sse3-hsubpd.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/sse3-hsubpd.c (nonexistent) +++ gcc/testsuite/gcc.target/powerpc/sse3-hsubpd.c (working copy) @@ -0,0 +1,101 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -mpower8-vector -Wno-psabi" } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-require-effective-target p8vector_hw } */ + +#ifndef CHECK_H +#define CHECK_H "sse3-check.h" +#endif + +#include CHECK_H + +#ifndef TEST +#define TEST sse3_test_hsubpd_1 +#endif + +#define NO_WARN_X86_INTRINSICS 1 +#include + +static void +sse3_test_hsubpd (double *i1, double *i2, double *r) +{ + __m128d t1 = _mm_loadu_pd (i1); + __m128d t2 = _mm_loadu_pd (i2); + + t1 = _mm_hsub_pd (t1, t2); + + _mm_storeu_pd (r, t1); +} + +static void +sse3_test_hsubpd_subsume (double *i1, double *i2, double *r) +{ + __m128d t1 = _mm_load_pd (i1); + __m128d t2 = _mm_load_pd (i2); + + t1 = _mm_hsub_pd (t1, t2); + + _mm_storeu_pd (r, t1); +} + +static int +chk_pd (double *v1, double *v2) +{ + int i; + int n_fails = 0; + + for (i = 0; i < 2; i++) + if (v1[i] != v2[i]) + n_fails += 1; + + return n_fails; +} + +static double p1[2] __attribute__ ((aligned(16))); +static double p2[2] __attribute__ ((aligned(16))); +static double p3[2]; +static double ck[2]; + +static double vals[] = + { + 100.0, 200.0, 300.0, 400.0, 5.0, -1.0, .345, -21.5, + 1100.0, 0.235, 321.3, 53.40, 0.3, 10.0, 42.0, 32.52, + 32.6, 123.3, 1.234, 2.156, 0.1, 3.25, 4.75, 32.44, + 12.16, 52.34, 64.12, 71.13, -.1, 2.30, 5.12, 3.785, + 541.3, 321.4, 231.4, 531.4, 71., 321., 231., -531., + 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, + 23.45, -1.43, -6.74, 6.345, -20.1, -20.1, -40.1, -40.1, + 1.234, 2.345, 3.456, 4.567, 5.678, 6.789, 7.891, 8.912, + -9.32, -8.41, -7.50, -6.59, -5.68, -4.77, -3.86, -2.95, + 9.32, 8.41, 7.50, 6.59, -5.68, -4.77, -3.86, -2.95 + }; + +//static +void TEST (void) +{ + int i; + int fail = 0; + + for (i = 0; i < sizeof (vals) / sizeof (vals[0]); i += 4) + { + p1[0] = vals[i + 0]; + p1[1] = vals[i + 1]; + + p2[0] = vals[i + 2]; + p2[1] = vals[i + 3]; + + ck[0] = p1[0] - p1[1]; + ck[1] = p2[0] - p2[1]; + + sse3_test_hsubpd (p1, p2, p3); + + fail += chk_pd (ck, p3); + + sse3_test_hsubpd_subsume (p1, p2, p3); + + fail += chk_pd (ck, p3); + } + + if (fail != 0) + abort (); +} Index: gcc/testsuite/gcc.target/powerpc/sse3-hsubps.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/sse3-hsubps.c (nonexistent) +++ gcc/testsuite/gcc.target/powerpc/sse3-hsubps.c (working copy) @@ -0,0 +1,108 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -mpower8-vector -Wno-psabi" } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-require-effective-target p8vector_hw } */ + +#ifndef CHECK_H +#define CHECK_H "sse3-check.h" +#endif + +#include CHECK_H + +#ifndef TEST +#define TEST sse3_test_hsubps_1 +#endif + +#define NO_WARN_X86_INTRINSICS 1 +#include + +static void +sse3_test_hsubps (float *i1, float *i2, float *r) +{ + __m128 t1 = _mm_loadu_ps (i1); + __m128 t2 = _mm_loadu_ps (i2); + + t1 = _mm_hsub_ps (t1, t2); + + _mm_storeu_ps (r, t1); +} + +static void +sse3_test_hsubps_subsume (float *i1, float *i2, float *r) +{ + __m128 t1 = _mm_load_ps (i1); + __m128 t2 = _mm_load_ps (i2); + + t1 = _mm_hsub_ps (t1, t2); + + _mm_storeu_ps (r, t1); +} + +static int +chk_ps(float *v1, float *v2) +{ + int i; + int n_fails = 0; + + for (i = 0; i < 4; i++) + if (v1[i] != v2[i]) + n_fails += 1; + + return n_fails; +} + +static float p1[4] __attribute__ ((aligned(16))); +static float p2[4] __attribute__ ((aligned(16))); +static float p3[4]; +static float ck[4]; + +static float vals[] = + { + 100.0, 200.0, 300.0, 400.0, 5.0, -1.0, .345, -21.5, + 1100.0, 0.235, 321.3, 53.40, 0.3, 10.0, 42.0, 32.52, + 32.6, 123.3, 1.234, 2.156, 0.1, 3.25, 4.75, 32.44, + 12.16, 52.34, 64.12, 71.13, -.1, 2.30, 5.12, 3.785, + 541.3, 321.4, 231.4, 531.4, 71., 321., 231., -531., + 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, + 23.45, -1.43, -6.74, 6.345, -20.1, -20.1, -40.1, -40.1, + 1.234, 2.345, 3.456, 4.567, 5.678, 6.789, 7.891, 8.912, + -9.32, -8.41, -7.50, -6.59, -5.68, -4.77, -3.86, -2.95, + 9.32, 8.41, 7.50, 6.59, -5.68, -4.77, -3.86, -2.95 + }; + +//static +void +TEST () +{ + int i; + int fail = 0; + + for (i = 0; i < sizeof (vals) / sizeof (vals[0]); i += 8) + { + p1[0] = vals[i+0]; + p1[1] = vals[i+1]; + p1[2] = vals[i+2]; + p1[3] = vals[i+3]; + + p2[0] = vals[i+4]; + p2[1] = vals[i+5]; + p2[2] = vals[i+6]; + p2[3] = vals[i+7]; + + ck[0] = p1[0] - p1[1]; + ck[1] = p1[2] - p1[3]; + ck[2] = p2[0] - p2[1]; + ck[3] = p2[2] - p2[3]; + + sse3_test_hsubps (p1, p2, p3); + + fail += chk_ps (ck, p3); + + sse3_test_hsubps_subsume (p1, p2, p3); + + fail += chk_ps (ck, p3); + } + + if (fail != 0) + abort (); +} Index: gcc/testsuite/gcc.target/powerpc/sse3-lddqu.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/sse3-lddqu.c (nonexistent) +++ gcc/testsuite/gcc.target/powerpc/sse3-lddqu.c (working copy) @@ -0,0 +1,80 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -mpower8-vector -Wno-psabi" } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-require-effective-target p8vector_hw } */ + +#ifndef CHECK_H +#define CHECK_H "sse3-check.h" +#endif + +#include CHECK_H + +#ifndef TEST +#define TEST sse3_test_lddqu_1 +#endif + +#define NO_WARN_X86_INTRINSICS 1 +#include + +static void +sse3_test_lddqu (double *i1, double *r) +{ + __m128i t1 = _mm_lddqu_si128 ((__m128i *) i1); + + _mm_storeu_si128 ((__m128i *) r, t1); +} + +static int +chk_pd (double *v1, double *v2) +{ + int i; + int n_fails = 0; + + for (i = 0; i < 2; i++) + if (v1[i] != v2[i]) + n_fails += 1; + + return n_fails; +} + +static double p1[2]; +static double p2[2]; +static double ck[2]; + +static double vals[] = + { + 100.0, 200.0, 300.0, 400.0, 5.0, -1.0, .345, -21.5, + 1100.0, 0.235, 321.3, 53.40, 0.3, 10.0, 42.0, 32.52, + 32.6, 123.3, 1.234, 2.156, 0.1, 3.25, 4.75, 32.44, + 12.16, 52.34, 64.12, 71.13, -.1, 2.30, 5.12, 3.785, + 541.3, 321.4, 231.4, 531.4, 71., 321., 231., -531., + 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, + 23.45, -1.43, -6.74, 6.345, -20.1, -20.1, -40.1, -40.1, + 1.234, 2.345, 3.456, 4.567, 5.678, 6.789, 7.891, 8.912, + -9.32, -8.41, -7.50, -6.59, -5.68, -4.77, -3.86, -2.95, + 9.32, 8.41, 7.50, 6.59, -5.68, -4.77, -3.86, -2.95 + }; + +//static +void +TEST (void) +{ + int i; + int fail = 0; + + for (i = 0; i < sizeof (vals) / sizeof (vals[0]); i += 2) + { + p1[0] = vals[i+0]; + p1[1] = vals[i+1]; + + sse3_test_lddqu (p1, p2); + + ck[0] = p1[0]; + ck[1] = p1[1]; + + fail += chk_pd (ck, p2); + } + + if (fail != 0) + abort (); +} Index: gcc/testsuite/gcc.target/powerpc/sse3-movddup.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/sse3-movddup.c (nonexistent) +++ gcc/testsuite/gcc.target/powerpc/sse3-movddup.c (working copy) @@ -0,0 +1,135 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -mpower8-vector -Wno-psabi" } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-require-effective-target p8vector_hw } */ + +#ifndef CHECK_H +#define CHECK_H "sse3-check.h" +#endif +#include CHECK_H + +#ifndef TEST +#define TEST sse3_test_movddup_1 +#endif + +#define NO_WARN_X86_INTRINSICS 1 +#include + +static void +sse3_test_movddup_mem (double *i1, double *r) +{ + __m128d t1 = _mm_loaddup_pd (i1); + + _mm_storeu_pd (r, t1); +} + +static double cnst1 [2] = {1.0, 1.0}; + +static void +sse3_test_movddup_reg (double *i1, double *r) +{ + __m128d t1 = _mm_loadu_pd (i1); + __m128d t2 = _mm_loadu_pd (&cnst1[0]); + + t1 = _mm_mul_pd (t1, t2); + t2 = _mm_movedup_pd (t1); + + _mm_storeu_pd (r, t2); +} + +static void +sse3_test_movddup_reg_subsume_unaligned (double *i1, double *r) +{ + __m128d t1 = _mm_loadu_pd (i1); + __m128d t2 = _mm_movedup_pd (t1); + + _mm_storeu_pd (r, t2); +} + +static void +sse3_test_movddup_reg_subsume_ldsd (double *i1, double *r) +{ + __m128d t1 = _mm_load_sd (i1); + __m128d t2 = _mm_movedup_pd (t1); + + _mm_storeu_pd (r, t2); +} + +static void +sse3_test_movddup_reg_subsume (double *i1, double *r) +{ + __m128d t1 = _mm_load_pd (i1); + __m128d t2 = _mm_movedup_pd (t1); + + _mm_storeu_pd (r, t2); +} + +static int +chk_pd (double *v1, double *v2) +{ + int i; + int n_fails = 0; + + for (i = 0; i < 2; i++) + if (v1[i] != v2[i]) + n_fails += 1; + + return n_fails; +} + +static double p1[2] __attribute__ ((aligned(16))); +static double p2[2]; +static double ck[2]; + +static double vals[] = + { + 100.0, 200.0, 300.0, 400.0, 5.0, -1.0, .345, -21.5, + 1100.0, 0.235, 321.3, 53.40, 0.3, 10.0, 42.0, 32.52, + 32.6, 123.3, 1.234, 2.156, 0.1, 3.25, 4.75, 32.44, + 12.16, 52.34, 64.12, 71.13, -.1, 2.30, 5.12, 3.785, + 541.3, 321.4, 231.4, 531.4, 71., 321., 231., -531., + 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, + 23.45, -1.43, -6.74, 6.345, -20.1, -20.1, -40.1, -40.1, + 1.234, 2.345, 3.456, 4.567, 5.678, 6.789, 7.891, 8.912, + -9.32, -8.41, -7.50, -6.59, -5.68, -4.77, -3.86, -2.95, + 9.32, 8.41, 7.50, 6.59, -5.68, -4.77, -3.86, -2.95 + }; + +//static +void +TEST (void) +{ + int i; + int fail = 0; + + for (i = 0; i < sizeof (vals) / sizeof (vals[0]); i += 1) + { + p1[0] = vals[i+0]; + + ck[0] = p1[0]; + ck[1] = p1[0]; + + sse3_test_movddup_mem (p1, p2); + + fail += chk_pd (ck, p2); + + sse3_test_movddup_reg (p1, p2); + + fail += chk_pd (ck, p2); + + sse3_test_movddup_reg_subsume (p1, p2); + + fail += chk_pd (ck, p2); + + sse3_test_movddup_reg_subsume_unaligned (p1, p2); + + fail += chk_pd (ck, p2); + + sse3_test_movddup_reg_subsume_ldsd (p1, p2); + + fail += chk_pd (ck, p2); + } + + if (fail != 0) + abort (); +} Index: gcc/testsuite/gcc.target/powerpc/sse3-movshdup.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/sse3-movshdup.c (nonexistent) +++ gcc/testsuite/gcc.target/powerpc/sse3-movshdup.c (working copy) @@ -0,0 +1,98 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -mpower8-vector -Wno-psabi" } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-require-effective-target p8vector_hw } */ + +#ifndef CHECK_H +#define CHECK_H "sse3-check.h" +#endif + +#include CHECK_H + +#ifndef TEST +#define TEST sse3_test_movshdup_1 +#endif + +#define NO_WARN_X86_INTRINSICS 1 +#include + +static void +sse3_test_movshdup_reg (float *i1, float *r) +{ + __m128 t1 = _mm_loadu_ps (i1); + __m128 t2 = _mm_movehdup_ps (t1); + + _mm_storeu_ps (r, t2); +} + +static void +sse3_test_movshdup_reg_subsume (float *i1, float *r) +{ + __m128 t1 = _mm_load_ps (i1); + __m128 t2 = _mm_movehdup_ps (t1); + + _mm_storeu_ps (r, t2); +} + +static int +chk_ps (float *v1, float *v2) +{ + int i; + int n_fails = 0; + + for (i = 0; i < 4; i++) + if (v1[i] != v2[i]) + n_fails += 1; + + return n_fails; +} + +static float p1[4] __attribute__ ((aligned(16))); +static float p2[4]; +static float ck[4]; + +static float vals[] = + { + 100.0, 200.0, 300.0, 400.0, 5.0, -1.0, .345, -21.5, + 1100.0, 0.235, 321.3, 53.40, 0.3, 10.0, 42.0, 32.52, + 32.6, 123.3, 1.234, 2.156, 0.1, 3.25, 4.75, 32.44, + 12.16, 52.34, 64.12, 71.13, -.1, 2.30, 5.12, 3.785, + 541.3, 321.4, 231.4, 531.4, 71., 321., 231., -531., + 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, + 23.45, -1.43, -6.74, 6.345, -20.1, -20.1, -40.1, -40.1, + 1.234, 2.345, 3.456, 4.567, 5.678, 6.789, 7.891, 8.912, + -9.32, -8.41, -7.50, -6.59, -5.68, -4.77, -3.86, -2.95, + 9.32, 8.41, 7.50, 6.59, -5.68, -4.77, -3.86, -2.95 + }; + +//static +void +TEST (void) +{ + int i; + int fail = 0; + + for (i = 0; i < sizeof (vals) / sizeof (vals[0]); i += 2) + { + p1[0] = 0.0; + p1[1] = vals[i+0]; + p1[2] = 1.0; + p1[3] = vals[i+1]; + + ck[0] = p1[1]; + ck[1] = p1[1]; + ck[2] = p1[3]; + ck[3] = p1[3]; + + sse3_test_movshdup_reg (p1, p2); + + fail += chk_ps (ck, p2); + + sse3_test_movshdup_reg_subsume (p1, p2); + + fail += chk_ps (ck, p2); + } + + if (fail != 0) + abort (); +} Index: gcc/testsuite/gcc.target/powerpc/sse3-movsldup.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/sse3-movsldup.c (nonexistent) +++ gcc/testsuite/gcc.target/powerpc/sse3-movsldup.c (working copy) @@ -0,0 +1,98 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -mpower8-vector -Wno-psabi" } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-require-effective-target p8vector_hw } */ + +#ifndef CHECK_H +#define CHECK_H "sse3-check.h" +#endif + +#include CHECK_H + +#ifndef TEST +#define TEST sse3_test_movsldup_1 +#endif + +#define NO_WARN_X86_INTRINSICS 1 +#include + +static void +sse3_test_movsldup_reg (float *i1, float *r) +{ + __m128 t1 = _mm_loadu_ps (i1); + __m128 t2 = _mm_moveldup_ps (t1); + + _mm_storeu_ps (r, t2); +} + +static void +sse3_test_movsldup_reg_subsume (float *i1, float *r) +{ + __m128 t1 = _mm_load_ps (i1); + __m128 t2 = _mm_moveldup_ps (t1); + + _mm_storeu_ps (r, t2); +} + +static int +chk_ps (float *v1, float *v2) +{ + int i; + int n_fails = 0; + + for (i = 0; i < 4; i++) + if (v1[i] != v2[i]) + n_fails += 1; + + return n_fails; +} + +static float p1[4] __attribute__ ((aligned(16))); +static float p2[4]; +static float ck[4]; + +static float vals[] = + { + 100.0, 200.0, 300.0, 400.0, 5.0, -1.0, .345, -21.5, + 1100.0, 0.235, 321.3, 53.40, 0.3, 10.0, 42.0, 32.52, + 32.6, 123.3, 1.234, 2.156, 0.1, 3.25, 4.75, 32.44, + 12.16, 52.34, 64.12, 71.13, -.1, 2.30, 5.12, 3.785, + 541.3, 321.4, 231.4, 531.4, 71., 321., 231., -531., + 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, 23.45, + 23.45, -1.43, -6.74, 6.345, -20.1, -20.1, -40.1, -40.1, + 1.234, 2.345, 3.456, 4.567, 5.678, 6.789, 7.891, 8.912, + -9.32, -8.41, -7.50, -6.59, -5.68, -4.77, -3.86, -2.95, + 9.32, 8.41, 7.50, 6.59, -5.68, -4.77, -3.86, -2.95 + }; + +//static +void +TEST (void) +{ + int i; + int fail = 0; + + for (i = 0; i < sizeof (vals) / sizeof (vals[0]); i += 2) + { + p1[0] = vals[i+0]; + p1[1] = 0.0; + p1[2] = vals[i+1]; + p1[3] = 1.0; + + ck[0] = p1[0]; + ck[1] = p1[0]; + ck[2] = p1[2]; + ck[3] = p1[2]; + + sse3_test_movsldup_reg (p1, p2); + + fail += chk_ps (ck, p2); + + sse3_test_movsldup_reg_subsume (p1, p2); + + fail += chk_ps (ck, p2); + } + + if (fail != 0) + abort (); +}