From patchwork Tue Aug 26 23:52:10 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bill Schmidt X-Patchwork-Id: 383253 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 36860140081 for ; Wed, 27 Aug 2014 09:52:06 +1000 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:subject:from:to:cc:date:content-type:mime-version :content-transfer-encoding; q=dns; s=default; b=LqUgG/9NpOTqsrK2 Ce65K04qrJc3bkpc+njrHa9SthoBU8g03vftDOHFvnceP4KMTDm2Gao+TkjS3AhH LUaa5d1e6Oq+PRl1EqAJPeld/eRZNoIryGGj3/aJNC+s/HNtLTkdx57EAyTLQDKw fOMmjLskcSvp/ePOhNQSqyKx1W8= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:subject:from:to:cc:date:content-type:mime-version :content-transfer-encoding; s=default; bh=nYXoOyojUA1q5BGOp0K1XI NTFOM=; b=V2o8r25Fo5Fg6xCzPi1aiAW0YhQTVRkF1A5OUwHG/wF7xXEEkDv8az tDAioXKafW6NDEbVmDkGk9PY/4siqYUVNHFJmIAyQXWpvxs/AsDBUL5zviHkGIUZ iaUpRROKcRNgCx+nksWSi6VJSLx+CRQex16v6QQwRov9wRja6NIKM= Received: (qmail 16845 invoked by alias); 26 Aug 2014 23:51:59 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 16834 invoked by uid 89); 26 Aug 2014 23:51:58 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.5 required=5.0 tests=AWL, BAYES_00, RP_MATCHES_RCVD autolearn=ham version=3.3.2 X-HELO: e31.co.us.ibm.com Received: from e31.co.us.ibm.com (HELO e31.co.us.ibm.com) (32.97.110.149) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA encrypted) ESMTPS; Tue, 26 Aug 2014 23:51:57 +0000 Received: from /spool/local by e31.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 26 Aug 2014 17:51:55 -0600 Received: from d03dlp03.boulder.ibm.com (9.17.202.179) by e31.co.us.ibm.com (192.168.1.131) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Tue, 26 Aug 2014 17:51:53 -0600 Received: from b03cxnp07029.gho.boulder.ibm.com (b03cxnp07029.gho.boulder.ibm.com [9.17.130.16]) by d03dlp03.boulder.ibm.com (Postfix) with ESMTP id D120519D8026 for ; Tue, 26 Aug 2014 17:51:40 -0600 (MDT) Received: from d03av04.boulder.ibm.com (d03av04.boulder.ibm.com [9.17.195.170]) by b03cxnp07029.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id s7QLluCm11272648 for ; Tue, 26 Aug 2014 23:47:56 +0200 Received: from d03av04.boulder.ibm.com (loopback [127.0.0.1]) by d03av04.boulder.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id s7QNpqah030875 for ; Tue, 26 Aug 2014 17:51:52 -0600 Received: from [9.50.19.136] ([9.50.19.136]) by d03av04.boulder.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP id s7QNpqgp030837; Tue, 26 Aug 2014 17:51:52 -0600 Message-ID: <1409097130.25590.72.camel@gnopaine> Subject: [PATCH,rs6000] Add some more vector built-ins From: Bill Schmidt To: gcc-patches@gcc.gnu.org Cc: dje.gcc@gmail.com Date: Tue, 26 Aug 2014 18:52:10 -0500 Mime-Version: 1.0 X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 14082623-8236-0000-0000-000004EB00B7 X-IsSubscribed: yes Hi, This patch adds a few more cases of overloaded vector built-ins to support V2DI and V2DF modes: vec_xl, vec_xst, vec_splat, vec_div, vec_mul, vec_round. These are all straightforward. For vec_div and vec_mul, the most efficient thing appears to be to just scalarize these; at least I couldn't come up with a magic sequence for multiply at this width. Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions. Is this ok for trunk? Thanks, Bill [gcc] 2014-08-26 Bill Schmidt * config/rs6000/altivec.h (vec_xl): New #define. (vec_xst): Likewise. * config/rs6000/rs6000-builtin.def (XXSPLTD_V2DF): New built-in. (XXSPLTD_V2DI): Likewise. (DIV_V2DI): Likewise. (UDIV_V2DI): Likewise. (MUL_V2DI): Likewise. * config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Add entries for VSX_BUILTIN_XVRDPI, VSX_BUILTIN_DIV_V2DI, VSX_BUILTIN_UDIV_V2DI, VSX_BUILTIN_MUL_V2DI, VSX_BUILTIN_XXSPLTD_V2DF, and VSX_BUILTIN_XXSPLTD_V2DI). * config/rs6000/vsx.md (UNSPEC_VSX_XXSPLTD): New unspec. (UNSPEC_VSX_DIVSD): Likewise. (UNSPEC_VSX_DIVUD): Likewise. (UNSPEC_VSX_MULSD): Likewise. (vsx_mul_v2di): New insn-and-split. (vsx_div_v2di): Likewise. (vsx_udiv_v2di): Likewise. (vsx_xxspltd_): New insn. [gcc/testsuite] 2014-08-26 Bill Schmidt * gcc.target/powerpc/builtins-1.c: Add tests for vec_xl, vec_xst, vec_round, vec_splat, vec_div, and vec_mul. * gcc.target/powerpc/builtins-2.c: New test. Index: gcc/config/rs6000/altivec.h =================================================================== --- gcc/config/rs6000/altivec.h (revision 214281) +++ gcc/config/rs6000/altivec.h (working copy) @@ -322,6 +322,8 @@ #define vec_sqrt __builtin_vec_sqrt #define vec_vsx_ld __builtin_vec_vsx_ld #define vec_vsx_st __builtin_vec_vsx_st +#define vec_xl __builtin_vec_vsx_ld +#define vec_xst __builtin_vec_vsx_st /* Note, xxsldi and xxpermdi were added as __builtin_vsx_ functions instead of __builtin_vec_ */ Index: gcc/config/rs6000/rs6000-builtin.def =================================================================== --- gcc/config/rs6000/rs6000-builtin.def (revision 214281) +++ gcc/config/rs6000/rs6000-builtin.def (working copy) @@ -1258,6 +1258,11 @@ BU_VSX_2 (VEC_MERGEL_V2DF, "mergel_2df", CONST, BU_VSX_2 (VEC_MERGEL_V2DI, "mergel_2di", CONST, vsx_mergel_v2di) BU_VSX_2 (VEC_MERGEH_V2DF, "mergeh_2df", CONST, vsx_mergeh_v2df) BU_VSX_2 (VEC_MERGEH_V2DI, "mergeh_2di", CONST, vsx_mergeh_v2di) +BU_VSX_2 (XXSPLTD_V2DF, "xxspltd_2df", CONST, vsx_xxspltd_v2df) +BU_VSX_2 (XXSPLTD_V2DI, "xxspltd_2di", CONST, vsx_xxspltd_v2di) +BU_VSX_2 (DIV_V2DI, "div_2di", CONST, vsx_div_v2di) +BU_VSX_2 (UDIV_V2DI, "udiv_2di", CONST, vsx_udiv_v2di) +BU_VSX_2 (MUL_V2DI, "mul_2di", CONST, vsx_mul_v2di) /* VSX abs builtin functions. */ BU_VSX_A (XVABSDP, "xvabsdp", CONST, absv2df2) Index: gcc/config/rs6000/rs6000-c.c =================================================================== --- gcc/config/rs6000/rs6000-c.c (revision 214281) +++ gcc/config/rs6000/rs6000-c.c (working copy) @@ -604,6 +604,8 @@ const struct altivec_builtin_types altivec_overloa RS6000_BTI_V2DF, RS6000_BTI_V2DF, 0, 0 }, { ALTIVEC_BUILTIN_VEC_ROUND, ALTIVEC_BUILTIN_VRFIN, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0, 0 }, + { ALTIVEC_BUILTIN_VEC_ROUND, VSX_BUILTIN_XVRDPI, + RS6000_BTI_V2DF, RS6000_BTI_V2DF, 0, 0 }, { ALTIVEC_BUILTIN_VEC_RECIP, ALTIVEC_BUILTIN_VRECIPFP, RS6000_BTI_V4SF, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 }, { ALTIVEC_BUILTIN_VEC_RECIP, VSX_BUILTIN_RECIP_V2DF, @@ -1161,6 +1163,10 @@ const struct altivec_builtin_types altivec_overloa RS6000_BTI_V4SF, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 }, { VSX_BUILTIN_VEC_DIV, VSX_BUILTIN_XVDIVDP, RS6000_BTI_V2DF, RS6000_BTI_V2DF, RS6000_BTI_V2DF, 0 }, + { VSX_BUILTIN_VEC_DIV, VSX_BUILTIN_DIV_V2DI, + RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 }, + { VSX_BUILTIN_VEC_DIV, VSX_BUILTIN_UDIV_V2DI, + RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 }, { ALTIVEC_BUILTIN_VEC_LD, ALTIVEC_BUILTIN_LVX_V2DF, RS6000_BTI_V2DF, RS6000_BTI_INTSI, ~RS6000_BTI_V2DF, 0 }, { ALTIVEC_BUILTIN_VEC_LD, ALTIVEC_BUILTIN_LVX_V2DI, @@ -1822,6 +1828,10 @@ const struct altivec_builtin_types altivec_overloa RS6000_BTI_V4SF, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 }, { VSX_BUILTIN_VEC_MUL, VSX_BUILTIN_XVMULDP, RS6000_BTI_V2DF, RS6000_BTI_V2DF, RS6000_BTI_V2DF, 0 }, + { VSX_BUILTIN_VEC_MUL, VSX_BUILTIN_MUL_V2DI, + RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 }, + { VSX_BUILTIN_VEC_MUL, VSX_BUILTIN_MUL_V2DI, + RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 }, { ALTIVEC_BUILTIN_VEC_MULE, ALTIVEC_BUILTIN_VMULEUB, RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0 }, { ALTIVEC_BUILTIN_VEC_MULE, ALTIVEC_BUILTIN_VMULESB, @@ -2204,6 +2214,14 @@ const struct altivec_builtin_types altivec_overloa RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, RS6000_BTI_INTSI, 0 }, { ALTIVEC_BUILTIN_VEC_SPLAT, ALTIVEC_BUILTIN_VSPLTW, RS6000_BTI_bool_V4SI, RS6000_BTI_bool_V4SI, RS6000_BTI_INTSI, 0 }, + { ALTIVEC_BUILTIN_VEC_SPLAT, VSX_BUILTIN_XXSPLTD_V2DF, + RS6000_BTI_V2DF, RS6000_BTI_V2DF, RS6000_BTI_INTSI, 0 }, + { ALTIVEC_BUILTIN_VEC_SPLAT, VSX_BUILTIN_XXSPLTD_V2DI, + RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_INTSI, 0 }, + { ALTIVEC_BUILTIN_VEC_SPLAT, VSX_BUILTIN_XXSPLTD_V2DI, + RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_INTSI, 0 }, + { ALTIVEC_BUILTIN_VEC_SPLAT, VSX_BUILTIN_XXSPLTD_V2DI, + RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_INTSI, 0 }, { ALTIVEC_BUILTIN_VEC_VSPLTW, ALTIVEC_BUILTIN_VSPLTW, RS6000_BTI_V4SF, RS6000_BTI_V4SF, RS6000_BTI_INTSI, 0 }, { ALTIVEC_BUILTIN_VEC_VSPLTW, ALTIVEC_BUILTIN_VSPLTW, Index: gcc/config/rs6000/vsx.md =================================================================== --- gcc/config/rs6000/vsx.md (revision 214281) +++ gcc/config/rs6000/vsx.md (working copy) @@ -260,6 +260,10 @@ UNSPEC_VSX_ROUND_IC UNSPEC_VSX_SLDWI UNSPEC_VSX_XXSPLTW + UNSPEC_VSX_XXSPLTD + UNSPEC_VSX_DIVSD + UNSPEC_VSX_DIVUD + UNSPEC_VSX_MULSD ]) ;; VSX moves @@ -747,6 +751,34 @@ [(set_attr "type" "") (set_attr "fp_type" "")]) +; Emulate vector with scalar for vec_mul in V2DImode +(define_insn_and_split "vsx_mul_v2di" + [(set (match_operand:V2DI 0 "vsx_register_operand" "=wa") + (unspec:V2DI [(match_operand:V2DI 1 "vsx_register_operand" "wa") + (match_operand:V2DI 2 "vsx_register_operand" "wa")] + UNSPEC_VSX_MULSD))] + "VECTOR_MEM_VSX_P (V2DImode)" + "#" + "VECTOR_MEM_VSX_P (V2DImode) && !reload_completed && !reload_in_progress" + [(const_int 0)] + " +{ + rtx op0 = operands[0]; + rtx op1 = operands[1]; + rtx op2 = operands[2]; + rtx op3 = gen_reg_rtx (DImode); + rtx op4 = gen_reg_rtx (DImode); + rtx op5 = gen_reg_rtx (DImode); + emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0))); + emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0))); + emit_insn (gen_muldi3 (op5, op3, op4)); + emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1))); + emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1))); + emit_insn (gen_muldi3 (op3, op3, op4)); + emit_insn (gen_vsx_concat_v2di (op0, op5, op3)); +}" + [(set_attr "type" "mul")]) + (define_insn "*vsx_div3" [(set (match_operand:VSX_F 0 "vsx_register_operand" "=,?") (div:VSX_F (match_operand:VSX_F 1 "vsx_register_operand" ",") @@ -756,6 +788,61 @@ [(set_attr "type" "") (set_attr "fp_type" "")]) +; Emulate vector with scalar for vec_div in V2DImode +(define_insn_and_split "vsx_div_v2di" + [(set (match_operand:V2DI 0 "vsx_register_operand" "=wa") + (unspec:V2DI [(match_operand:V2DI 1 "vsx_register_operand" "wa") + (match_operand:V2DI 2 "vsx_register_operand" "wa")] + UNSPEC_VSX_DIVSD))] + "VECTOR_MEM_VSX_P (V2DImode)" + "#" + "VECTOR_MEM_VSX_P (V2DImode) && !reload_completed && !reload_in_progress" + [(const_int 0)] + " +{ + rtx op0 = operands[0]; + rtx op1 = operands[1]; + rtx op2 = operands[2]; + rtx op3 = gen_reg_rtx (DImode); + rtx op4 = gen_reg_rtx (DImode); + rtx op5 = gen_reg_rtx (DImode); + emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0))); + emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0))); + emit_insn (gen_divdi3 (op5, op3, op4)); + emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1))); + emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1))); + emit_insn (gen_divdi3 (op3, op3, op4)); + emit_insn (gen_vsx_concat_v2di (op0, op5, op3)); +}" + [(set_attr "type" "div")]) + +(define_insn_and_split "vsx_udiv_v2di" + [(set (match_operand:V2DI 0 "vsx_register_operand" "=wa") + (unspec:V2DI [(match_operand:V2DI 1 "vsx_register_operand" "wa") + (match_operand:V2DI 2 "vsx_register_operand" "wa")] + UNSPEC_VSX_DIVUD))] + "VECTOR_MEM_VSX_P (V2DImode)" + "#" + "VECTOR_MEM_VSX_P (V2DImode) && !reload_completed && !reload_in_progress" + [(const_int 0)] + " +{ + rtx op0 = operands[0]; + rtx op1 = operands[1]; + rtx op2 = operands[2]; + rtx op3 = gen_reg_rtx (DImode); + rtx op4 = gen_reg_rtx (DImode); + rtx op5 = gen_reg_rtx (DImode); + emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (0))); + emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (0))); + emit_insn (gen_udivdi3 (op5, op3, op4)); + emit_insn (gen_vsx_extract_v2di (op3, op1, GEN_INT (1))); + emit_insn (gen_vsx_extract_v2di (op4, op2, GEN_INT (1))); + emit_insn (gen_udivdi3 (op3, op3, op4)); + emit_insn (gen_vsx_concat_v2di (op0, op5, op3)); +}" + [(set_attr "type" "div")]) + ;; *tdiv* instruction returning the FG flag (define_expand "vsx_tdiv3_fg" [(set (match_dup 3) @@ -1898,6 +1985,22 @@ "xxspltw %x0,%x1,%2" [(set_attr "type" "vecperm")]) +;; V2DF/V2DI splat for use by vec_splat builtin +(define_insn "vsx_xxspltd_" + [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa") + (unspec:VSX_D [(match_operand:VSX_D 1 "vsx_register_operand" "wa") + (match_operand:QI 2 "u5bit_cint_operand" "i")] + UNSPEC_VSX_XXSPLTD))] + "VECTOR_MEM_VSX_P (mode)" +{ + if ((VECTOR_ELT_ORDER_BIG && INTVAL (operands[2]) == 0) + || (!VECTOR_ELT_ORDER_BIG && INTVAL (operands[2]) == 1)) + return "xxpermdi %x0,%x1,%x1,0"; + else + return "xxpermdi %x0,%x1,%x1,3"; +} + [(set_attr "type" "vecperm")]) + ;; V4SF/V4SI interleave (define_insn "vsx_xxmrghw_" [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wf,?") Index: gcc/testsuite/gcc.target/powerpc/builtins-1.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/builtins-1.c (revision 214281) +++ gcc/testsuite/gcc.target/powerpc/builtins-1.c (working copy) @@ -6,6 +6,9 @@ #include +vector double y = { 2.0, 4.0 }; +vector double z; + int main () { vector float fa = {1.0, 2.0, 3.0, -4.0}; @@ -134,5 +137,25 @@ int main () vector signed char scb = vec_cntlz (sca); vector unsigned char cb = vec_cntlz (ca); + vector double dd = vec_xl (0, &y); + vec_xst (dd, 0, &z); + + vector double de = vec_round (dd); + + vector double df = vec_splat (de, 0); + vector double dg = vec_splat (de, 1); + vector long long l3 = vec_splat (l2, 0); + vector long long l4 = vec_splat (l2, 1); + vector unsigned long long u3 = vec_splat (u2, 0); + vector unsigned long long u4 = vec_splat (u2, 1); + vector bool long long l5 = vec_splat (ld, 0); + vector bool long long l6 = vec_splat (ld, 1); + + vector long long l7 = vec_div (l3, l4); + vector unsigned long long u5 = vec_div (u3, u4); + + vector long long l8 = vec_mul (l3, l4); + vector unsigned long long u6 = vec_mul (u3, u4); + return 0; } Index: gcc/testsuite/gcc.target/powerpc/builtins-2.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/builtins-2.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/builtins-2.c (working copy) @@ -0,0 +1,38 @@ +/* { dg-do run { target { powerpc64le-*-* } } } */ +/* { dg-options "-mcpu=power8 " } */ + +#include + +void abort (void); + +int main () +{ + vector long long sa = {27L, -14L}; + vector long long sb = {-9L, -2L}; + + vector unsigned long long ua = {27L, 14L}; + vector unsigned long long ub = {9L, 2L}; + + vector long long sc = vec_div (sa, sb); + vector unsigned long long uc = vec_div (ua, ub); + + if (sc[0] != -3L || sc[1] != 7L || uc[0] != 3L || uc[1] != 7L) + abort (); + + vector long long sd = vec_mul (sa, sb); + vector unsigned long long ud = vec_mul (ua, ub); + + if (sd[0] != -243L || sd[1] != 28L || ud[0] != 243L || ud[1] != 28L) + abort (); + + vector long long se = vec_splat (sa, 0); + vector long long sf = vec_splat (sa, 1); + vector unsigned long long ue = vec_splat (ua, 0); + vector unsigned long long uf = vec_splat (ua, 1); + + if (se[0] != 27L || se[1] != 27L || sf[0] != -14L || sf[1] != -14L + || ue[0] != 27L || ue[1] != 27L || uf[0] != 14L || uf[1] != 14L) + abort (); + + return 0; +}