From patchwork Thu Aug 5 02:06:34 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kewen.Lin" X-Patchwork-Id: 1513705 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=SSVpt7l0; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GgBp15T1Pz9sW8 for ; Thu, 5 Aug 2021 12:07:36 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D66E73983C4A for ; Thu, 5 Aug 2021 02:07:32 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D66E73983C4A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1628129252; bh=FEMoiKIP4qjojPfjKx+erT2TlsWedYAX5n7zYPz3NW4=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=SSVpt7l01PwqP8BLMxeqhhENroskzxmMnMz9Fy127NPm2I3ZsjPgtRCbcceV3OAKi hzhnf/uo7YW4XLkL7CreSRsSdgxDK0KFAIzomjgHlq+C3gkhCJpJzLV3pKrDS8X9Tc 7rRvwSbN+yQZX4XD9TiI/KqzTCJrx9RhCtuQdLkY= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 1859B386486C for ; Thu, 5 Aug 2021 02:06:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 1859B386486C Received: from pps.filterd (m0098393.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 17523OCZ146558; Wed, 4 Aug 2021 22:06:45 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 3a835fn1dv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Aug 2021 22:06:44 -0400 Received: from m0098393.ppops.net (m0098393.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 17523lwc000975; Wed, 4 Aug 2021 22:06:44 -0400 Received: from ppma04fra.de.ibm.com (6a.4a.5195.ip4.static.sl-reverse.com [149.81.74.106]) by mx0a-001b2d01.pphosted.com with ESMTP id 3a835fn1d5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Aug 2021 22:06:44 -0400 Received: from pps.filterd (ppma04fra.de.ibm.com [127.0.0.1]) by ppma04fra.de.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 17523vNW019549; Thu, 5 Aug 2021 02:06:41 GMT Received: from b06cxnps3075.portsmouth.uk.ibm.com (d06relay10.portsmouth.uk.ibm.com [9.149.109.195]) by ppma04fra.de.ibm.com with ESMTP id 3a85q2r1jp-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 05 Aug 2021 02:06:41 +0000 Received: from d06av26.portsmouth.uk.ibm.com (d06av26.portsmouth.uk.ibm.com [9.149.105.62]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 17526dG948759104 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 5 Aug 2021 02:06:39 GMT Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 34A1EAE06A; Thu, 5 Aug 2021 02:06:39 +0000 (GMT) Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 1D833AE061; Thu, 5 Aug 2021 02:06:37 +0000 (GMT) Received: from KewenLins-MacBook-Pro.local (unknown [9.200.41.230]) by d06av26.portsmouth.uk.ibm.com (Postfix) with ESMTP; Thu, 5 Aug 2021 02:06:36 +0000 (GMT) To: GCC Patches Subject: [PATCH] rs6000: Add vec_unpacku_{hi,lo}_v4si Message-ID: <0068d8dd-0e30-e78f-8893-dd24f0f5250a@linux.ibm.com> Date: Thu, 5 Aug 2021 10:06:34 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.10.0 MIME-Version: 1.0 Content-Language: en-US X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: IwQ1MZswUPQ_oI5uXLJwDLLQe_W7r2lQ X-Proofpoint-GUID: 9lZAXTSWiNM1ZqjqQ5QYqfx4pdRKzNTZ X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.790 definitions=2021-08-04_08:2021-08-04, 2021-08-04 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 adultscore=0 phishscore=0 suspectscore=0 mlxscore=0 impostorscore=0 clxscore=1015 mlxlogscore=999 malwarescore=0 bulkscore=0 lowpriorityscore=0 spamscore=0 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2107140000 definitions=main-2108050010 X-Spam-Status: No, score=-11.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: "Kewen.Lin via Gcc-patches" From: "Kewen.Lin" Reply-To: "Kewen.Lin" Cc: Bill Schmidt , David Edelsohn , Segher Boessenkool Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" Hi, The existing vec_unpacku_{hi,lo} supports emulated unsigned unpacking for short and char but misses the support for int. This patch adds the support for vec_unpacku_{hi,lo}_v4si. Meanwhile, the current implementation uses vector permutation way, which requires one extra customized constant vector as the permutation control vector. It's better to use vector merge high/low with zero constant vector, to save the space in constant area as well as the cost to initialize pcv in prologue. This patch updates it with vector merging and simplify it with iterators. Bootstrapped & regtested on powerpc64le-linux-gnu P9 and powerpc64-linux-gnu P8. btw, the loop in unpack-vectorize-2.c doesn't get vectorized without this patch, unpack-vectorize-[13]* is to verify the vector merging and simplification works expectedly. Is it ok for trunk? BR, Kewen ----- gcc/ChangeLog: * config/rs6000/altivec.md (vec_unpacku_hi_v16qi): Remove. (vec_unpacku_hi_v8hi): Likewise. (vec_unpacku_lo_v16qi): Likewise. (vec_unpacku_lo_v8hi): Likewise. (vec_unpacku_hi_): New define_expand. (vec_unpacku_lo_): Likewise. gcc/testsuite/ChangeLog: * gcc.target/powerpc/unpack-vectorize-1.c: New test. * gcc.target/powerpc/unpack-vectorize-1.h: New test. * gcc.target/powerpc/unpack-vectorize-2.c: New test. * gcc.target/powerpc/unpack-vectorize-2.h: New test. * gcc.target/powerpc/unpack-vectorize-3.c: New test. * gcc.target/powerpc/unpack-vectorize-3.h: New test. * gcc.target/powerpc/unpack-vectorize-run-1.c: New test. * gcc.target/powerpc/unpack-vectorize-run-2.c: New test. * gcc.target/powerpc/unpack-vectorize-run-3.c: New test. * gcc.target/powerpc/unpack-vectorize.h: New test. --- gcc/config/rs6000/altivec.md | 158 ++++-------------- .../gcc.target/powerpc/unpack-vectorize-1.c | 18 ++ .../gcc.target/powerpc/unpack-vectorize-1.h | 14 ++ .../gcc.target/powerpc/unpack-vectorize-2.c | 12 ++ .../gcc.target/powerpc/unpack-vectorize-2.h | 7 + .../gcc.target/powerpc/unpack-vectorize-3.c | 11 ++ .../gcc.target/powerpc/unpack-vectorize-3.h | 7 + .../powerpc/unpack-vectorize-run-1.c | 24 +++ .../powerpc/unpack-vectorize-run-2.c | 16 ++ .../powerpc/unpack-vectorize-run-3.c | 16 ++ .../gcc.target/powerpc/unpack-vectorize.h | 42 +++++ 11 files changed, 196 insertions(+), 129 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.c create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.h create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-2.c create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-2.h create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-3.c create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-3.h create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-1.c create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-2.c create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-3.c create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize.h diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md index d70c17e6bc2..0e8b66cd6a5 100644 --- a/gcc/config/rs6000/altivec.md +++ b/gcc/config/rs6000/altivec.md @@ -134,10 +134,8 @@ (define_c_enum "unspec" UNSPEC_VMULWLUH UNSPEC_VMULWHSH UNSPEC_VMULWLSH - UNSPEC_VUPKHUB - UNSPEC_VUPKHUH - UNSPEC_VUPKLUB - UNSPEC_VUPKLUH + UNSPEC_VUPKHUBHW + UNSPEC_VUPKLUBHW UNSPEC_VPERMSI UNSPEC_VPERMHI UNSPEC_INTERHI @@ -3885,143 +3883,45 @@ (define_insn "xxeval" [(set_attr "type" "vecsimple") (set_attr "prefixed" "yes")]) -(define_expand "vec_unpacku_hi_v16qi" - [(set (match_operand:V8HI 0 "register_operand" "=v") - (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")] - UNSPEC_VUPKHUB))] - "TARGET_ALTIVEC" -{ - rtx vzero = gen_reg_rtx (V8HImode); - rtx mask = gen_reg_rtx (V16QImode); - rtvec v = rtvec_alloc (16); - bool be = BYTES_BIG_ENDIAN; - - emit_insn (gen_altivec_vspltish (vzero, const0_rtx)); - - RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, be ? 16 : 7); - RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, be ? 0 : 16); - RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, be ? 16 : 6); - RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, be ? 1 : 16); - RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, be ? 16 : 5); - RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, be ? 2 : 16); - RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, be ? 16 : 4); - RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, be ? 3 : 16); - RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, be ? 16 : 3); - RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, be ? 4 : 16); - RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ? 16 : 2); - RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ? 5 : 16); - RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 : 1); - RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ? 6 : 16); - RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 16 : 0); - RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ? 7 : 16); - - emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v))); - emit_insn (gen_vperm_v16qiv8hi (operands[0], operands[1], vzero, mask)); - DONE; -}) - -(define_expand "vec_unpacku_hi_v8hi" - [(set (match_operand:V4SI 0 "register_operand" "=v") - (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")] - UNSPEC_VUPKHUH))] +(define_expand "vec_unpacku_hi_" + [(set (match_operand:VP 0 "register_operand" "=v") + (unspec:VP [(match_operand: 1 "register_operand" "v")] + UNSPEC_VUPKHUBHW))] "TARGET_ALTIVEC" { - rtx vzero = gen_reg_rtx (V4SImode); - rtx mask = gen_reg_rtx (V16QImode); - rtvec v = rtvec_alloc (16); - bool be = BYTES_BIG_ENDIAN; + rtx vzero = gen_reg_rtx (mode); + emit_insn (gen_altivec_vspltis (vzero, const0_rtx)); - emit_insn (gen_altivec_vspltisw (vzero, const0_rtx)); - - RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, be ? 16 : 7); - RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, be ? 17 : 6); - RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, be ? 0 : 17); - RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, be ? 1 : 16); - RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, be ? 16 : 5); - RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, be ? 17 : 4); - RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, be ? 2 : 17); - RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, be ? 3 : 16); - RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, be ? 16 : 3); - RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, be ? 17 : 2); - RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ? 4 : 17); - RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ? 5 : 16); - RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 : 1); - RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ? 17 : 0); - RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 6 : 17); - RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ? 7 : 16); - - emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v))); - emit_insn (gen_vperm_v8hiv4si (operands[0], operands[1], vzero, mask)); - DONE; -}) + rtx res = gen_reg_rtx (mode); + rtx op1 = operands[1]; -(define_expand "vec_unpacku_lo_v16qi" - [(set (match_operand:V8HI 0 "register_operand" "=v") - (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")] - UNSPEC_VUPKLUB))] - "TARGET_ALTIVEC" -{ - rtx vzero = gen_reg_rtx (V8HImode); - rtx mask = gen_reg_rtx (V16QImode); - rtvec v = rtvec_alloc (16); - bool be = BYTES_BIG_ENDIAN; - - emit_insn (gen_altivec_vspltish (vzero, const0_rtx)); - - RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, be ? 16 : 15); - RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, be ? 8 : 16); - RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, be ? 16 : 14); - RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, be ? 9 : 16); - RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, be ? 16 : 13); - RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, be ? 10 : 16); - RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, be ? 16 : 12); - RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, be ? 11 : 16); - RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, be ? 16 : 11); - RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, be ? 12 : 16); - RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ? 16 : 10); - RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ? 13 : 16); - RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 : 9); - RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ? 14 : 16); - RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 16 : 8); - RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ? 15 : 16); + if (BYTES_BIG_ENDIAN) + emit_insn (gen_altivec_vmrgh (res, vzero, op1)); + else + emit_insn (gen_altivec_vmrgl (res, op1, vzero)); - emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v))); - emit_insn (gen_vperm_v16qiv8hi (operands[0], operands[1], vzero, mask)); + emit_insn (gen_move_insn (operands[0], gen_lowpart (mode, res))); DONE; }) -(define_expand "vec_unpacku_lo_v8hi" - [(set (match_operand:V4SI 0 "register_operand" "=v") - (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")] - UNSPEC_VUPKLUH))] +(define_expand "vec_unpacku_lo_" + [(set (match_operand:VP 0 "register_operand" "=v") + (unspec:VP [(match_operand: 1 "register_operand" "v")] + UNSPEC_VUPKLUBHW))] "TARGET_ALTIVEC" { - rtx vzero = gen_reg_rtx (V4SImode); - rtx mask = gen_reg_rtx (V16QImode); - rtvec v = rtvec_alloc (16); - bool be = BYTES_BIG_ENDIAN; + rtx vzero = gen_reg_rtx (mode); + emit_insn (gen_altivec_vspltis (vzero, const0_rtx)); - emit_insn (gen_altivec_vspltisw (vzero, const0_rtx)); - - RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, be ? 16 : 15); - RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, be ? 17 : 14); - RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, be ? 8 : 17); - RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, be ? 9 : 16); - RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, be ? 16 : 13); - RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, be ? 17 : 12); - RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, be ? 10 : 17); - RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, be ? 11 : 16); - RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, be ? 16 : 11); - RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, be ? 17 : 10); - RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ? 12 : 17); - RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ? 13 : 16); - RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 : 9); - RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ? 17 : 8); - RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 14 : 17); - RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ? 15 : 16); + rtx res = gen_reg_rtx (mode); + rtx op1 = operands[1]; - emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v))); - emit_insn (gen_vperm_v8hiv4si (operands[0], operands[1], vzero, mask)); + if (BYTES_BIG_ENDIAN) + emit_insn (gen_altivec_vmrgl (res, vzero, op1)); + else + emit_insn (gen_altivec_vmrgh (res, op1, vzero)); + + emit_insn (gen_move_insn (operands[0], gen_lowpart (mode, res))); DONE; }) diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.c b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.c new file mode 100644 index 00000000000..2621d753baa --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_altivec_ok } */ +/* { dg-options "-maltivec -O2 -ftree-vectorize -fno-vect-cost-model -fdump-tree-vect-details" } */ + +/* Test if unpack vectorization succeeds for type signed/unsigned + short and char. */ + +#include "unpack-vectorize-1.h" + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" } } */ +/* { dg-final { scan-assembler {\mvupkhsb\M} } } */ +/* { dg-final { scan-assembler {\mvupklsb\M} } } */ +/* { dg-final { scan-assembler {\mvupkhsh\M} } } */ +/* { dg-final { scan-assembler {\mvupklsh\M} } } */ +/* { dg-final { scan-assembler {\mvmrghb\M} } } */ +/* { dg-final { scan-assembler {\mvmrglb\M} } } */ +/* { dg-final { scan-assembler {\mvmrghh\M} } } */ +/* { dg-final { scan-assembler {\mvmrglh\M} } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.h b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.h new file mode 100644 index 00000000000..1cb89aba392 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.h @@ -0,0 +1,14 @@ +#include "unpack-vectorize.h" + +DEF_ARR (si) +DEF_ARR (ui) +DEF_ARR (sh) +DEF_ARR (uh) +DEF_ARR (sc) +DEF_ARR (uc) + +TEST1 (sh, si) +TEST1 (uh, ui) +TEST1 (sc, sh) +TEST1 (uc, uh) + diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-2.c b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-2.c new file mode 100644 index 00000000000..3e7e97da43c --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-2.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-mdejagnu-cpu=power7 -O2 -ftree-vectorize -fno-vect-cost-model -fdump-tree-vect-details" } */ + +/* Test if unsigned int unpack vectorization succeeds. V2DImode is + supported since Power7 so guard it under Power7 and up. */ + +#include "unpack-vectorize-2.h" + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ +/* { dg-final { scan-assembler {\mxxmrghw\M} } } */ +/* { dg-final { scan-assembler {\mxxmrglw\M} } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-2.h b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-2.h new file mode 100644 index 00000000000..e199229e6f7 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-2.h @@ -0,0 +1,7 @@ +#include "unpack-vectorize.h" + +DEF_ARR (ui) +DEF_ARR (ull) + +TEST1 (ui, ull) + diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-3.c b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-3.c new file mode 100644 index 00000000000..a246e7e26b6 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-3.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-options "-mdejagnu-cpu=power8 -O2 -ftree-vectorize -fno-vect-cost-model -fdump-tree-vect-details" } */ + +/* Test if signed int unpack vectorization succeeds. */ + +#include "unpack-vectorize-3.h" + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ +/* { dg-final { scan-assembler {\mvupkhsw\M} } } */ +/* { dg-final { scan-assembler {\mvupklsw\M} } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-3.h b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-3.h new file mode 100644 index 00000000000..6a5191d28a7 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-3.h @@ -0,0 +1,7 @@ +#include "unpack-vectorize.h" + +DEF_ARR (si) +DEF_ARR (sll) + +TEST1 (si, sll) + diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-1.c b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-1.c new file mode 100644 index 00000000000..51f0e67524f --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-1.c @@ -0,0 +1,24 @@ +/* { dg-do run } */ +/* { dg-require-effective-target vmx_hw } */ +/* { dg-options "-maltivec -O2 -ftree-vectorize -fno-vect-cost-model" } */ + +#include "unpack-vectorize-1.h" + +/* Test if unpack vectorization cases on signed/unsigned short and char + run successfully. */ + +CHECK1 (sh, si) +CHECK1 (uh, ui) +CHECK1 (sc, sh) +CHECK1 (uc, uh) + +int +main () +{ + check1_sh_si (); + check1_uh_ui (); + check1_sc_sh (); + check1_uc_uh (); + + return 0; +} diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-2.c b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-2.c new file mode 100644 index 00000000000..6d243602bbf --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-require-effective-target vsx_hw } */ +/* { dg-options "-mdejagnu-cpu=power7 -O2 -ftree-vectorize -fno-vect-cost-model" } */ + +#include "unpack-vectorize-2.h" + +/* Test if unpack vectorization cases on unsigned int run successfully. */ + +CHECK1 (ui, ull) + +int +main () +{ + check1_ui_ull (); + return 0; +} diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-3.c b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-3.c new file mode 100644 index 00000000000..fec33c46abc --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-3.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-require-effective-target p8vector_hw } */ +/* { dg-options "-mdejagnu-cpu=power8 -O2 -ftree-vectorize -fno-vect-cost-model" } */ + +#include "unpack-vectorize-3.h" + +/* Test if unpack vectorization cases on signed int run successfully. */ + +CHECK1 (si, sll) + +int +main () +{ + check1_si_sll (); + return 0; +} diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize.h b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize.h new file mode 100644 index 00000000000..11fa7d4aa6f --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize.h @@ -0,0 +1,42 @@ +typedef signed long long sll; +typedef unsigned long long ull; +typedef signed int si; +typedef unsigned int ui; +typedef signed short sh; +typedef unsigned short uh; +typedef signed char sc; +typedef unsigned char uc; + +#ifndef ALIGN +#define ALIGN 32 +#endif + +#define ALIGN_ATTR __attribute__((__aligned__(ALIGN))) + +#define N 128 + +#define DEF_ARR(TYPE) \ + TYPE TYPE##_a[N] ALIGN_ATTR; \ + TYPE TYPE##_b[N] ALIGN_ATTR; \ + TYPE TYPE##_c[N] ALIGN_ATTR; + +#define TEST1(NTYPE, WTYPE) \ + __attribute__((noipa)) void test1_##NTYPE##_##WTYPE() { \ + for (int i = 0; i < N; i++) \ + WTYPE##_c[i] = NTYPE##_a[i] + NTYPE##_b[i]; \ + } + +#define CHECK1(NTYPE, WTYPE) \ + __attribute__((noipa, optimize(0))) void check1_##NTYPE##_##WTYPE() { \ + for (int i = 0; i < N; i++) { \ + NTYPE##_a[i] = 2 * i * sizeof(NTYPE) + 10; \ + NTYPE##_b[i] = 7 * i * sizeof(NTYPE) / 5 - 10; \ + } \ + test1_##NTYPE##_##WTYPE(); \ + for (int i = 0; i < N; i++) { \ + WTYPE exp = NTYPE##_a[i] + NTYPE##_b[i]; \ + if (WTYPE##_c[i] != exp) \ + __builtin_abort(); \ + } \ + } +