From patchwork Wed Aug 11 06:56:11 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kewen.Lin" X-Patchwork-Id: 1515614 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=Fi1LZ9C8; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4Gl0xS45kJz9sRK for ; Wed, 11 Aug 2021 16:57:15 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6A786396ECA4 for ; Wed, 11 Aug 2021 06:57:13 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 6A786396ECA4 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1628665033; bh=yptsbpnxD0of40uADG/DeD3s5Emi4KfBmRauEsVycsI=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=Fi1LZ9C80L9SkUeuPGFVapv5oy+V4GNUVUWkJFHt0iMC+1ShtlpPuJ2hM2h2UBdRT 9T5lOhehmJ1yoY2Xil+VtjAglLD2NWaCT3w/7iBYnD5tLV7RTA3+xjA5ooUzeD6Rc8 dbKtMDhRUnUhaGYPerzq/dWU9ShmvjdMW7c/pR8c= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id F41B93854834 for ; Wed, 11 Aug 2021 06:56:19 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org F41B93854834 Received: from pps.filterd (m0098420.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 17B6Z3hQ084995; Wed, 11 Aug 2021 02:56:19 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com with ESMTP id 3abb7puw8u-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 11 Aug 2021 02:56:19 -0400 Received: from m0098420.ppops.net (m0098420.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 17B6ZGh4085989; Wed, 11 Aug 2021 02:56:18 -0400 Received: from ppma04ams.nl.ibm.com (63.31.33a9.ip4.static.sl-reverse.com [169.51.49.99]) by mx0b-001b2d01.pphosted.com with ESMTP id 3abb7puw8g-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 11 Aug 2021 02:56:18 -0400 Received: from pps.filterd (ppma04ams.nl.ibm.com [127.0.0.1]) by ppma04ams.nl.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 17B6rAxe020968; Wed, 11 Aug 2021 06:56:16 GMT Received: from b06cxnps3075.portsmouth.uk.ibm.com (d06relay10.portsmouth.uk.ibm.com [9.149.109.195]) by ppma04ams.nl.ibm.com with ESMTP id 3a9ht8yh2v-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 11 Aug 2021 06:56:16 +0000 Received: from d06av22.portsmouth.uk.ibm.com (d06av22.portsmouth.uk.ibm.com [9.149.105.58]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 17B6uEOa43188730 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 11 Aug 2021 06:56:14 GMT Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4692F4C05E; Wed, 11 Aug 2021 06:56:14 +0000 (GMT) Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id AD0AA4C06D; Wed, 11 Aug 2021 06:56:12 +0000 (GMT) Received: from kewenlins-mbp.cn.ibm.com (unknown [9.200.147.34]) by d06av22.portsmouth.uk.ibm.com (Postfix) with ESMTP; Wed, 11 Aug 2021 06:56:12 +0000 (GMT) To: GCC Patches Subject: [PATCH] rs6000: Make some BIFs vectorized on P10 Message-ID: <0f77a46a-4f13-65b8-cb8c-5fd9ed17537c@linux.ibm.com> Date: Wed, 11 Aug 2021 14:56:11 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.10.0 MIME-Version: 1.0 Content-Language: en-US X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: c1hIJj51aharRcREps2iD0q5wwhqA1_q X-Proofpoint-GUID: 3uLMr4G4BoZvx8RbcsTi0nqd75gybmU6 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.790 definitions=2021-08-11_02:2021-08-10, 2021-08-11 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 adultscore=0 mlxlogscore=999 mlxscore=0 malwarescore=0 priorityscore=1501 lowpriorityscore=0 spamscore=0 suspectscore=0 clxscore=1015 bulkscore=0 impostorscore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2107140000 definitions=main-2108110042 X-Spam-Status: No, score=-10.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_NUMSUBJECT, KAM_SHORT, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: "Kewen.Lin via Gcc-patches" From: "Kewen.Lin" Reply-To: "Kewen.Lin" Cc: Bill Schmidt , David Edelsohn , Segher Boessenkool Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" Hi, This patch is to add the support to make vectorizer able to vectorize scalar version of some built-in functions with its corresponding vector version with Power10 support. Bootstrapped & regtested on powerpc64le-linux-gnu {P9,P10} and powerpc64-linux-gnu P8. Is it ok for trunk? BR, Kewen ----- gcc/ChangeLog: * config/rs6000/rs6000.c (rs6000_builtin_md_vectorized_function): Add support for some built-in functions vectorized on Power10. gcc/testsuite/ChangeLog: * gcc.target/powerpc/dive-vectorize-1.c: New test. * gcc.target/powerpc/dive-vectorize-1.h: New test. * gcc.target/powerpc/dive-vectorize-2.c: New test. * gcc.target/powerpc/dive-vectorize-2.h: New test. * gcc.target/powerpc/dive-vectorize-run-1.c: New test. * gcc.target/powerpc/dive-vectorize-run-2.c: New test. * gcc.target/powerpc/p10-bifs-vectorize-1.c: New test. * gcc.target/powerpc/p10-bifs-vectorize-1.h: New test. * gcc.target/powerpc/p10-bifs-vectorize-run-1.c: New test. --- gcc/config/rs6000/rs6000.c | 55 +++++++++++++++++++ .../gcc.target/powerpc/dive-vectorize-1.c | 11 ++++ .../gcc.target/powerpc/dive-vectorize-1.h | 22 ++++++++ .../gcc.target/powerpc/dive-vectorize-2.c | 12 ++++ .../gcc.target/powerpc/dive-vectorize-2.h | 22 ++++++++ .../gcc.target/powerpc/dive-vectorize-run-1.c | 52 ++++++++++++++++++ .../gcc.target/powerpc/dive-vectorize-run-2.c | 53 ++++++++++++++++++ .../gcc.target/powerpc/p10-bifs-vectorize-1.c | 15 +++++ .../gcc.target/powerpc/p10-bifs-vectorize-1.h | 40 ++++++++++++++ .../powerpc/p10-bifs-vectorize-run-1.c | 45 +++++++++++++++ 10 files changed, 327 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/dive-vectorize-1.c create mode 100644 gcc/testsuite/gcc.target/powerpc/dive-vectorize-1.h create mode 100644 gcc/testsuite/gcc.target/powerpc/dive-vectorize-2.c create mode 100644 gcc/testsuite/gcc.target/powerpc/dive-vectorize-2.h create mode 100644 gcc/testsuite/gcc.target/powerpc/dive-vectorize-run-1.c create mode 100644 gcc/testsuite/gcc.target/powerpc/dive-vectorize-run-2.c create mode 100644 gcc/testsuite/gcc.target/powerpc/p10-bifs-vectorize-1.c create mode 100644 gcc/testsuite/gcc.target/powerpc/p10-bifs-vectorize-1.h create mode 100644 gcc/testsuite/gcc.target/powerpc/p10-bifs-vectorize-run-1.c diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 279f00cc648..3eac1d05101 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -5785,6 +5785,61 @@ rs6000_builtin_md_vectorized_function (tree fndecl, tree type_out, default: break; } + + machine_mode in_vmode = TYPE_MODE (type_in); + machine_mode out_vmode = TYPE_MODE (type_out); + + /* Power10 supported vectorized built-in functions. */ + if (TARGET_POWER10 + && in_vmode == out_vmode + && VECTOR_UNIT_ALTIVEC_OR_VSX_P (in_vmode)) + { + machine_mode exp_mode = DImode; + machine_mode exp_vmode = V2DImode; + enum rs6000_builtins vname = RS6000_BUILTIN_COUNT; + switch (fn) + { + case MISC_BUILTIN_DIVWE: + case MISC_BUILTIN_DIVWEU: + exp_mode = SImode; + exp_vmode = V4SImode; + if (fn == MISC_BUILTIN_DIVWE) + vname = P10V_BUILTIN_DIVES_V4SI; + else + vname = P10V_BUILTIN_DIVEU_V4SI; + break; + case MISC_BUILTIN_DIVDE: + case MISC_BUILTIN_DIVDEU: + if (fn == MISC_BUILTIN_DIVDE) + vname = P10V_BUILTIN_DIVES_V2DI; + else + vname = P10V_BUILTIN_DIVEU_V2DI; + break; + case P10_BUILTIN_CFUGED: + vname = P10V_BUILTIN_VCFUGED; + break; + case P10_BUILTIN_CNTLZDM: + vname = P10V_BUILTIN_VCLZDM; + break; + case P10_BUILTIN_CNTTZDM: + vname = P10V_BUILTIN_VCTZDM; + break; + case P10_BUILTIN_PDEPD: + vname = P10V_BUILTIN_VPDEPD; + break; + case P10_BUILTIN_PEXTD: + vname = P10V_BUILTIN_VPEXTD; + break; + default: + return NULL_TREE; + } + + if (vname != RS6000_BUILTIN_COUNT + && in_mode == exp_mode + && in_vmode == exp_vmode) + return rs6000_builtin_decls[vname]; + } + return NULL_TREE; } diff --git a/gcc/testsuite/gcc.target/powerpc/dive-vectorize-1.c b/gcc/testsuite/gcc.target/powerpc/dive-vectorize-1.c new file mode 100644 index 00000000000..84f1b0a88f2 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/dive-vectorize-1.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2 -ftree-vectorize -fno-vect-cost-model -fno-unroll-loops -fdump-tree-vect-details" } */ + +/* Test if signed/unsigned int extended divisions get vectorized. */ + +#include "dive-vectorize-1.h" + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } } */ +/* { dg-final { scan-assembler-times {\mvdivesw\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mvdiveuw\M} 1 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/dive-vectorize-1.h b/gcc/testsuite/gcc.target/powerpc/dive-vectorize-1.h new file mode 100644 index 00000000000..119f637b46b --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/dive-vectorize-1.h @@ -0,0 +1,22 @@ +#define N 128 + +typedef signed int si; +typedef unsigned int ui; + +si si_a[N], si_b[N], si_c[N]; +ui ui_a[N], ui_b[N], ui_c[N]; + +__attribute__ ((noipa)) void +test_divwe () +{ + for (int i = 0; i < N; i++) + si_c[i] = __builtin_divwe (si_a[i], si_b[i]); +} + +__attribute__ ((noipa)) void +test_divweu () +{ + for (int i = 0; i < N; i++) + ui_c[i] = __builtin_divweu (ui_a[i], ui_b[i]); +} + diff --git a/gcc/testsuite/gcc.target/powerpc/dive-vectorize-2.c b/gcc/testsuite/gcc.target/powerpc/dive-vectorize-2.c new file mode 100644 index 00000000000..4db0085562f --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/dive-vectorize-2.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2 -ftree-vectorize -fno-vect-cost-model -fno-unroll-loops -fdump-tree-vect-details" } */ + +/* Test if signed/unsigned long long extended divisions get vectorized. */ + +#include "dive-vectorize-2.h" + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } } */ +/* { dg-final { scan-assembler-times {\mvdivesd\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mvdiveud\M} 1 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/dive-vectorize-2.h b/gcc/testsuite/gcc.target/powerpc/dive-vectorize-2.h new file mode 100644 index 00000000000..1cab56b2e0b --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/dive-vectorize-2.h @@ -0,0 +1,22 @@ +#define N 128 + +typedef signed long long sLL; +typedef unsigned long long uLL; + +sLL sll_a[N], sll_b[N], sll_c[N]; +uLL ull_a[N], ull_b[N], ull_c[N]; + +__attribute__ ((noipa)) void +test_divde () +{ + for (int i = 0; i < N; i++) + sll_c[i] = __builtin_divde (sll_a[i], sll_b[i]); +} + +__attribute__ ((noipa)) void +test_divdeu () +{ + for (int i = 0; i < N; i++) + ull_c[i] = __builtin_divdeu (ull_a[i], ull_b[i]); +} + diff --git a/gcc/testsuite/gcc.target/powerpc/dive-vectorize-run-1.c b/gcc/testsuite/gcc.target/powerpc/dive-vectorize-run-1.c new file mode 100644 index 00000000000..1d5cbaa9f9b --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/dive-vectorize-run-1.c @@ -0,0 +1,52 @@ +/* { dg-do run } */ +/* { dg-require-effective-target power10_hw } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2 -ftree-vectorize -fno-vect-cost-model" } */ + +#include "dive-vectorize-1.h" + +/* Check if test cases with signed/unsigned int extended division + vectorization run successfully. */ + +__attribute__ ((optimize (1))) void +check_divwe () +{ + test_divwe (); + for (int i = 0; i < N; i++) + { + si exp = __builtin_divwe (si_a[i], si_b[i]); + if (exp != si_c[i]) + __builtin_abort (); + } +} + +__attribute__ ((optimize (1))) void +check_divweu () +{ + test_divweu (); + for (int i = 0; i < N; i++) + { + ui exp = __builtin_divweu (ui_a[i], ui_b[i]); + if (exp != ui_c[i]) + __builtin_abort (); + } +} + +int +main () +{ + for (int i = 0; i < N; i++) + { + si_a[i] = 0x10 * (i * 3 + 2); + si_b[i] = 0x7890 * (i * 3 + 1); + ui_a[i] = 0x234 * (i * 11 + 3) - 0xcd * (i * 5 - 7); + ui_b[i] = 0x6078 * (i * 7 + 3) + 0xef * (i * 7 - 11); + if (si_b[i] == 0 || ui_b[i] == 0) + __builtin_abort (); + } + + check_divwe (); + check_divweu (); + + return 0; +} + diff --git a/gcc/testsuite/gcc.target/powerpc/dive-vectorize-run-2.c b/gcc/testsuite/gcc.target/powerpc/dive-vectorize-run-2.c new file mode 100644 index 00000000000..921b07b3f1b --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/dive-vectorize-run-2.c @@ -0,0 +1,53 @@ +/* { dg-do run } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-require-effective-target power10_hw } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2 -ftree-vectorize -fno-vect-cost-model" } */ + +#include "dive-vectorize-2.h" + +/* Check if test cases with signed/unsigned int extended division + vectorization run successfully. */ + +__attribute__ ((optimize (1))) void +check_divde () +{ + test_divde (); + for (int i = 0; i < N; i++) + { + sLL exp = __builtin_divde (sll_a[i], sll_b[i]); + if (exp != sll_c[i]) + __builtin_abort (); + } +} + +__attribute__ ((optimize (1))) void +check_divdeu () +{ + test_divdeu (); + for (int i = 0; i < N; i++) + { + uLL exp = __builtin_divdeu (ull_a[i], ull_b[i]); + if (exp != ull_c[i]) + __builtin_abort (); + } +} + +int +main () +{ + for (int i = 0; i < N; i++) + { + sll_a[i] = 0x102 * (i * 3 + 2); + sll_b[i] = 0x789ab * (i * 3 + 1); + ull_a[i] = 0x2345 * (i * 11 + 3) - 0xcd1 * (i * 5 - 7); + ull_b[i] = 0x6078e * (i * 7 + 3) + 0xefa * (i * 7 - 11); + if (sll_b[i] == 0 || ull_b[i] == 0) + __builtin_abort (); + } + + check_divde (); + check_divdeu (); + + return 0; +} + diff --git a/gcc/testsuite/gcc.target/powerpc/p10-bifs-vectorize-1.c b/gcc/testsuite/gcc.target/powerpc/p10-bifs-vectorize-1.c new file mode 100644 index 00000000000..9b8b3642ead --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/p10-bifs-vectorize-1.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2 -ftree-vectorize -fno-vect-cost-model -fno-unroll-loops -fdump-tree-vect-details" } */ + +/* Test if some Power10 built-in functions get vectorized. */ + +#include "p10-bifs-vectorize-1.h" + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 5 "vect" } } */ +/* { dg-final { scan-assembler-times {\mvcfuged\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mvclzdm\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mvctzdm\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mvpdepd\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mvpextd\M} 1 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/p10-bifs-vectorize-1.h b/gcc/testsuite/gcc.target/powerpc/p10-bifs-vectorize-1.h new file mode 100644 index 00000000000..80b7aacf810 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/p10-bifs-vectorize-1.h @@ -0,0 +1,40 @@ +#define N 32 + +typedef unsigned long long uLL; +uLL ull_a[N], ull_b[N], ull_c[N]; + +__attribute__ ((noipa)) void +test_cfuged () +{ + for (int i = 0; i < N; i++) + ull_c[i] = __builtin_cfuged (ull_a[i], ull_b[i]); +} + +__attribute__ ((noipa)) void +test_cntlzdm () +{ + for (int i = 0; i < N; i++) + ull_c[i] = __builtin_cntlzdm (ull_a[i], ull_b[i]); +} + +__attribute__ ((noipa)) void +test_cnttzdm () +{ + for (int i = 0; i < N; i++) + ull_c[i] = __builtin_cnttzdm (ull_a[i], ull_b[i]); +} + +__attribute__ ((noipa)) void +test_pdepd () +{ + for (int i = 0; i < N; i++) + ull_c[i] = __builtin_pdepd (ull_a[i], ull_b[i]); +} + +__attribute__ ((noipa)) void +test_pextd () +{ + for (int i = 0; i < N; i++) + ull_c[i] = __builtin_pextd (ull_a[i], ull_b[i]); +} + diff --git a/gcc/testsuite/gcc.target/powerpc/p10-bifs-vectorize-run-1.c b/gcc/testsuite/gcc.target/powerpc/p10-bifs-vectorize-run-1.c new file mode 100644 index 00000000000..4b3b1165663 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/p10-bifs-vectorize-run-1.c @@ -0,0 +1,45 @@ +/* { dg-do run } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-require-effective-target power10_hw } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2 -ftree-vectorize -fno-vect-cost-model" } */ + +#include "p10-bifs-vectorize-1.h" + +/* Check if vectorized built-in functions run expectedly. */ + +#define CHECK(name) \ + __attribute__ ((optimize (1))) void check_##name () \ + { \ + test_##name (); \ + for (int i = 0; i < N; i++) \ + { \ + uLL exp = __builtin_##name (ull_a[i], ull_b[i]); \ + if (exp != ull_c[i]) \ + __builtin_abort (); \ + } \ + } + +CHECK (cfuged) +CHECK (cntlzdm) +CHECK (cnttzdm) +CHECK (pdepd) +CHECK (pextd) + +int +main () +{ + for (int i = 0; i < N; i++) + { + ull_a[i] = 0x789a * (i * 11 - 5) - 0xcd1 * (i * 5 - 7); + ull_b[i] = 0xfedc * (i * 7 + 3) + 0x467 * (i * 7 - 11); + } + + check_cfuged (); + check_cntlzdm (); + check_cnttzdm (); + check_pdepd (); + check_pextd (); + + return 0; +} +