From patchwork Thu Jun 16 10:58:24 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 1644272 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=U8XKdh8J; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4LNzh86JN4z9sG0 for ; Thu, 16 Jun 2022 20:59:19 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B6FED384D17C for ; Thu, 16 Jun 2022 10:59:16 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B6FED384D17C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1655377156; bh=8o8xI2ZxNGznPaBRfo1bk0wZ8DFcSDjQ62LcJn6G5pM=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=U8XKdh8JPjU+PlDPgvIRauCHkGIg3sMA2wZAn5FqrDOH28Zfz3yAiezi1BjI+MD0T 1ZuFTVGhhZc2HEpZlK6K9k7yUdoiU5pptsGO7Qy1VcJNfQHIEWAaob3PieDFcZU3gu QHSHo8Ei4PlakRKSgJREeTkdernzhcXDosF5fNlM= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR05-DB8-obe.outbound.protection.outlook.com (mail-db8eur05on2056.outbound.protection.outlook.com [40.107.20.56]) by sourceware.org (Postfix) with ESMTPS id 3A8F73856950 for ; Thu, 16 Jun 2022 10:58:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 3A8F73856950 ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=nLSSM+z+dKB67APVZxBtiswik+sDMz5u3W5gUTSZ5gEtF9Ntr1iA+MQQlYdIAxvOYvHNr2DlwhAFImAGZqAT6gHLSugEbYv3RrzQ90KWWjKZt9EO2SMiMebsJp0K2dO1sxX6+60YxC+EHB/ZoQJwq3xYMannM46dTiHpV/fPchZNcJH0MkGoC0kHj7LGoz2cEetE6gkpPK2+SMi9Xv5TatNdCctVqkVntUWfjSmTCBvOWnzM6HeEZey3W2ry9bcL6HMFJR2yhzOipLnQNgFEPYWwO/nc5G2uPkpZ83ISo6Q2mV0RKzKQ8NMti8va3SFT8P3OLw7CSmoNdo2KDXDeCA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=8o8xI2ZxNGznPaBRfo1bk0wZ8DFcSDjQ62LcJn6G5pM=; b=hG8EdV1jj54qzZKuz5QEG3oH+JjFdibSPT5VII9RINcp+Ky8MI+sFXKIeh3LAYE9M1BjrQl+XIso0AB7ZhgD9NpIDhgsjxf5Ztrsf3rJyQXaTCBpP7oPwKO+75ysVtIT5p0zTRAyjnQWOoBRF5PDxkGy289TgodwcPpt7hbyba98leCGmvL52FFploHBLdpqVfQR6ylI2Bt+Z18nx7q+iJZ9IMEn0AdQPUC5Au5Jtbay69A2asTWqJMApWBCRu+ffBuD+s5Ia8D15dPNhtOlsKAMcnCYEuOGyEjSmcplSY3ONvc7M1sHR7cwvVkU5KRPBomkpuIOin33qFgcdp6toQ== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) Received: from AS9PR06CA0746.eurprd06.prod.outlook.com (2603:10a6:20b:487::26) by AM6PR08MB5240.eurprd08.prod.outlook.com (2603:10a6:20b:ec::28) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5353.13; Thu, 16 Jun 2022 10:58:51 +0000 Received: from AM5EUR03FT045.eop-EUR03.prod.protection.outlook.com (2603:10a6:20b:487:cafe::88) by AS9PR06CA0746.outlook.office365.com (2603:10a6:20b:487::26) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5353.14 via Frontend Transport; Thu, 16 Jun 2022 10:58:51 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM5EUR03FT045.mail.protection.outlook.com (10.152.17.105) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5353.14 via Frontend Transport; Thu, 16 Jun 2022 10:58:51 +0000 Received: ("Tessian outbound d3318d0cda7b:v120"); Thu, 16 Jun 2022 10:58:51 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: a6a7e5356527a42f X-CR-MTA-TID: 64aa7808 Received: from 108f78ad2f90.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id D7DBB116-C3D9-4F7E-8F5D-06CE7DE32894.1; Thu, 16 Jun 2022 10:58:29 +0000 Received: from EUR04-HE1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 108f78ad2f90.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 16 Jun 2022 10:58:29 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=W3wqogUNqYg3MU5z3SN76TCCzYjh5kbjqgE/Gn1yQ1pYQahygrsalSYebj+0E/QZw/nfZms1Wfhq8j+8otnhC46JuvVkpoKdAOAvJyMJDXkm1epAVLGRspsjjKhcvUoyVASV/hPaJSfdcgxU9ZWLLn50TiyWmugWnb6pgiB74gzTMrLHPj8XMqO/ijC991RW+OsvzihhAXrvV2cEpopeIfieClREhH0BOMwtvShNbnaBCLhaR6+AZe8Lh1VQ+bAs/X2Xz3lijBdcqK2rMXxbTgO2wHR2MnrklgUV11LIdIYQmJj7FWIQrAC/4tvV753o4Oz0OKmWp4I8YwogWCD/fA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=8o8xI2ZxNGznPaBRfo1bk0wZ8DFcSDjQ62LcJn6G5pM=; b=YwARypLKeuEnt12cZBff0hEJtk0UCg00Kdpg+d1yrPaQ3MeZXg9gK1DShNpmLBpVytUrg7rATZvatA2ORnZ1C95ZqlOiH6ZUIFkkgAZouxx/Nq+Bmuyy/i+1ub/6UoUzOtVyVzQW3aq/Q+JkDaKfZKRuNE9OEotupMnMtFXBdpZKqLNc3G3TUMjfbnSc3520PjsRRWJBTm2TmgBmBPWdKMDN9EDTeqKd0K0Mze5jojQH0qZ7e5QDjub7PaJkj8P4uwCRkrnXnDk4j8BQcjuSBRXlUjy5bYcQhrsLhtmkGBvllw2TzG1/ZsQucuCmxt+fkye+LBenO4bMvJmxls1wRw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by AS4PR08MB7952.eurprd08.prod.outlook.com (2603:10a6:20b:580::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5353.14; Thu, 16 Jun 2022 10:58:26 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::54e5:594b:e5fd:a9b4]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::54e5:594b:e5fd:a9b4%8]) with mapi id 15.20.5353.014; Thu, 16 Jun 2022 10:58:26 +0000 Date: Thu, 16 Jun 2022 11:58:24 +0100 To: gcc-patches@gcc.gnu.org Subject: [PATCH]middle-end Add optimized float addsub without needing VEC_PERM_EXPR. Message-ID: Content-Disposition: inline X-ClientProxiedBy: LO4P123CA0344.GBRP123.PROD.OUTLOOK.COM (2603:10a6:600:18d::7) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-Office365-Filtering-Correlation-Id: 13074ec2-1259-44b7-ed36-08da4f873267 X-MS-TrafficTypeDiagnostic: AS4PR08MB7952:EE_|AM5EUR03FT045:EE_|AM6PR08MB5240:EE_ X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: Go86oYux6v7GpqKM6BaDx9qU3zocb633Qvhhx8ZfiuW9jACpvhPZQrtd0j+hSYqcUFOVMWcjVp7RxR78YcKLzQlu/kohs3UYfmN/j0Kw1kUpkT7DrlPYPXpVmNX4yGAJs6t7Zo7/d23j6VxB1BaZ4IZQA/SM7wU9rVoLEd5TK8G4xpJfI+15k+Zwc2sjKCUKAJV3j4wkNKkorBs10IzE90PsAXPevjzNbo/NGBXCibvWYna1B4+J2vjnT22Cp0ohXvTquFfoDQrgoDqkHLUwhhzNQsGKdxhNrf4GrgErmWjdiqgZtqdf6LoWn+44Oq8VOwBxE3kZBEw9QrmjI8eAUrfZ4DmsEjoJKMVMNSx4XIQG9l1Dc+UhF99l+RFVzGEby0UXLl9IgHcBL/4mKEl/b8jGkfJICfLfq8nPXI+Pq2uPAbICmcHoXUNnSB/DcGAps/I92jPDgwDXkdHiUGjuryXtQ9hz3mncVYwQPkJOMrdn3fko3+vyH0sXHW5/x5nCDewiJjRnHX7W6YmwXi0bBB23uuIsGh1YEeNhTUtscDdiregGIgLwqhz04QylX8yrV0ueJDpXzaI/WturPqKFT7njAzPj5BGAC/6z1W8yEalrZ9I9dCCZLs+ayNYtutZpnE0FD0eOSTsxO8wa8mp2BJBiHAVeUKeFAeWMG6K2Zbe3ZfT0mhMC9QdAqpBMcookKaG9SYjpFGJ4Tz0h6Zvq3oVObCtnfhiPOjq5riTeWcyH+JiGScq+wIZqD61LQU1fjrQC17zc6LZMrapglxTcywOcLjKQgDfFL3UhA6ZYDegtiqZlPDUt3mZ0t2GAPL/u/ya9EclZTGcEjPwpnkYg91JLfd937J7gGfYywbzdZR8= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230016)(4636009)(366004)(44144004)(66946007)(26005)(4743002)(33964004)(66476007)(6506007)(66556008)(4326008)(6512007)(508600001)(2906002)(2616005)(5660300002)(86362001)(6916009)(8676002)(316002)(38100700002)(36756003)(8936002)(44832011)(235185007)(186003)(84970400001)(6486002)(4216001)(2700100001); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS4PR08MB7952 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM5EUR03FT045.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 5a0e8297-ee94-4270-ca8a-08da4f8723ad X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: CTwppZmp9BQgebV6IqzBfj+Dd4h4KFyMSBTGncw3L4ECtehs15DjSzlVyzaYpf5ymOv0P70YwfjtLAYL7d4Owd/7vSw/aSxJHlsuK3BiKDBKPQoeXsiC9K+FnqcxImDsWUi294xcC9YHNo3Eydc4U/B1V7t85yrKbdJyaXbisdiZcD5RgHbD02YM+wtSSBn9JJkoKZF8bOisYopaWGHP9/XPwUy408OfWl9zUmillVZy/aK9yYqZXdpda9uhvKjgcKlxFua0j6wJCELwcU4AKJXD1ZVT6Q7SRuUbP/lkbeInW7Fq07pQZLxEtqmlX0QakGBvMVM8fsk93JRfSNQIaWw+tj8jFDgt6BUOwDzWHZwSlrDEnLbvDXMnTooGejq4qcsyQeUeudoL6R9j+4M7CtRy+EkQ4y4MPy5kNRfE3Jqi6Z7cixj5M8yCIPLqkYZ6eYVSu8K4egSiJadA7RmksFm9z6S8PNDz5jXH+inU2SJbMirjupY9NvhlgaX54pOB3gU3spX3mQ+cPo8J0D4YTgoymZF2NTVJifZ+mGG/L/tNEzO2ZEVB70ovXBJTt+1qZEoXWyDWH8RGi27vQbgmZSOQMzvl8fEmGDmELfVwgmHmY8T0JKZlh+5VZq7j3A0ZgRqyglpEw+wqA51gQ63Ztu8h7lydkyQutjc5bXk6HTucvKUhWIUKm15RlcpQBMRqa0Mq/2wpoY1f5A02qb8iVd0VtB9AFnxtRXAb9ZLcnp2nrnSlWXR4OgowJjr4GYiOB6j6d6qR8m8HRl/VgkTL5mLmzkLPh56hbuf7rgpZftrbHQZHP37HRTor/cQ19x5Qgl7BLw2W8Z6XFElFoBww7U2lVtomM7gRTgtTlmEyqgU= X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230016)(4636009)(46966006)(36840700001)(40470700004)(235185007)(498600001)(44832011)(356005)(84970400001)(8936002)(6512007)(6486002)(2616005)(5660300002)(6506007)(44144004)(107886003)(186003)(33964004)(336012)(81166007)(36756003)(86362001)(4743002)(26005)(70586007)(36860700001)(8676002)(4326008)(6916009)(40460700003)(70206006)(47076005)(82310400005)(316002)(2906002)(4216001)(2700100001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 16 Jun 2022 10:58:51.0260 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 13074ec2-1259-44b7-ed36-08da4f873267 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM5EUR03FT045.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM6PR08MB5240 X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, KAM_LOTSOFHASH, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tamar Christina via Gcc-patches From: Tamar Christina Reply-To: Tamar Christina Cc: nd@arm.com, rguenther@suse.de Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" Hi All, For IEEE 754 floating point formats we can replace a sequence of alternative +/- with fneg of a wider type followed by an fadd. This eliminated the need for using a permutation. This patch adds a math.pd rule to recognize and do this rewriting. For void f (float *restrict a, float *restrict b, float *res, int n) { for (int i = 0; i < (n & -4); i+=2) { res[i+0] = a[i+0] + b[i+0]; res[i+1] = a[i+1] - b[i+1]; } } we generate: .L3: ldr q1, [x1, x3] ldr q0, [x0, x3] fneg v1.2d, v1.2d fadd v0.4s, v0.4s, v1.4s str q0, [x2, x3] add x3, x3, 16 cmp x3, x4 bne .L3 now instead of: .L3: ldr q1, [x0, x3] ldr q2, [x1, x3] fadd v0.4s, v1.4s, v2.4s fsub v1.4s, v1.4s, v2.4s tbl v0.16b, {v0.16b - v1.16b}, v3.16b str q0, [x2, x3] add x3, x3, 16 cmp x3, x4 bne .L3 Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Thanks to George Steed for the idea. Ok for master? Thanks, Tamar gcc/ChangeLog: * match.pd: Add fneg/fadd rule. gcc/testsuite/ChangeLog: * gcc.target/aarch64/simd/addsub_1.c: New test. * gcc.target/aarch64/sve/addsub_1.c: New test. --- inline copy of patch -- diff --git a/gcc/match.pd b/gcc/match.pd index 51b0a1b562409af535e53828a10c30b8a3e1ae2e..af1c98d4a2831f38258d6fc1bbe811c8ee6c7c6e 100644 --- diff --git a/gcc/match.pd b/gcc/match.pd index 51b0a1b562409af535e53828a10c30b8a3e1ae2e..af1c98d4a2831f38258d6fc1bbe811c8ee6c7c6e 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -7612,6 +7612,49 @@ and, (simplify (reduc (op @0 VECTOR_CST@1)) (op (reduc:type @0) (reduc:type @1)))) +/* Simplify vector floating point operations of alternating sub/add pairs + into using an fneg of a wider element type followed by a normal add. + under IEEE 754 the fneg of the wider type will negate every even entry + and when doing an add we get a sub of the even and add of every odd + elements. */ +(simplify + (vec_perm (plus:c @0 @1) (minus @0 @1) VECTOR_CST@2) + (if (!VECTOR_INTEGER_TYPE_P (type) && !BYTES_BIG_ENDIAN) + (with + { + /* Build a vector of integers from the tree mask. */ + vec_perm_builder builder; + if (!tree_to_vec_perm_builder (&builder, @2)) + return NULL_TREE; + + /* Create a vec_perm_indices for the integer vector. */ + poly_uint64 nelts = TYPE_VECTOR_SUBPARTS (type); + vec_perm_indices sel (builder, 2, nelts); + } + (if (sel.series_p (0, 2, 0, 2)) + (with + { + machine_mode vec_mode = TYPE_MODE (type); + auto elem_mode = GET_MODE_INNER (vec_mode); + auto nunits = exact_div (GET_MODE_NUNITS (vec_mode), 2); + tree stype; + switch (elem_mode) + { + case E_HFmode: + stype = float_type_node; + break; + case E_SFmode: + stype = double_type_node; + break; + default: + return NULL_TREE; + } + tree ntype = build_vector_type (stype, nunits); + if (!ntype) + return NULL_TREE; + } + (plus (view_convert:type (negate (view_convert:ntype @1))) @0)))))) + (simplify (vec_perm @0 @1 VECTOR_CST@2) (with diff --git a/gcc/testsuite/gcc.target/aarch64/simd/addsub_1.c b/gcc/testsuite/gcc.target/aarch64/simd/addsub_1.c new file mode 100644 index 0000000000000000000000000000000000000000..1fb91a34c421bbd2894faa0dbbf1b47ad43310c4 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/simd/addsub_1.c @@ -0,0 +1,56 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_v8_2a_fp16_neon_ok } */ +/* { dg-options "-Ofast" } */ +/* { dg-add-options arm_v8_2a_fp16_neon } */ +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */ + +#pragma GCC target "+nosve" + +/* +** f1: +** ... +** fneg v[0-9]+.2d, v[0-9]+.2d +** fadd v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s +** ... +*/ +void f1 (float *restrict a, float *restrict b, float *res, int n) +{ + for (int i = 0; i < (n & -4); i+=2) + { + res[i+0] = a[i+0] + b[i+0]; + res[i+1] = a[i+1] - b[i+1]; + } +} + +/* +** d1: +** ... +** fneg v[0-9]+.4s, v[0-9]+.4s +** fadd v[0-9]+.8h, v[0-9]+.8h, v[0-9]+.8h +** ... +*/ +void d1 (_Float16 *restrict a, _Float16 *restrict b, _Float16 *res, int n) +{ + for (int i = 0; i < (n & -8); i+=2) + { + res[i+0] = a[i+0] + b[i+0]; + res[i+1] = a[i+1] - b[i+1]; + } +} + +/* +** e1: +** ... +** fadd v[0-9]+.2d, v[0-9]+.2d, v[0-9]+.2d +** fsub v[0-9]+.2d, v[0-9]+.2d, v[0-9]+.2d +** ins v[0-9]+.d\[1\], v[0-9]+.d\[1\] +** ... +*/ +void e1 (double *restrict a, double *restrict b, double *res, int n) +{ + for (int i = 0; i < (n & -4); i+=2) + { + res[i+0] = a[i+0] + b[i+0]; + res[i+1] = a[i+1] - b[i+1]; + } +} diff --git a/gcc/testsuite/gcc.target/aarch64/sve/addsub_1.c b/gcc/testsuite/gcc.target/aarch64/sve/addsub_1.c new file mode 100644 index 0000000000000000000000000000000000000000..ea7f9d9db2c8c9a3efe5c7951a314a29b7a7a922 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/addsub_1.c @@ -0,0 +1,52 @@ +/* { dg-do compile } */ +/* { dg-options "-Ofast" } */ +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */ + +/* +** f1: +** ... +** fneg z[0-9]+.d, p[0-9]+/m, z[0-9]+.d +** fadd z[0-9]+.s, z[0-9]+.s, z[0-9]+.s +** ... +*/ +void f1 (float *restrict a, float *restrict b, float *res, int n) +{ + for (int i = 0; i < (n & -4); i+=2) + { + res[i+0] = a[i+0] + b[i+0]; + res[i+1] = a[i+1] - b[i+1]; + } +} + +/* +** d1: +** ... +** fneg z[0-9]+.s, p[0-9]+/m, z[0-9]+.s +** fadd z[0-9]+.h, z[0-9]+.h, z[0-9]+.h +** ... +*/ +void d1 (_Float16 *restrict a, _Float16 *restrict b, _Float16 *res, int n) +{ + for (int i = 0; i < (n & -8); i+=2) + { + res[i+0] = a[i+0] + b[i+0]; + res[i+1] = a[i+1] - b[i+1]; + } +} + +/* +** e1: +** ... +** fsub z[0-9]+.d, z[0-9]+.d, z[0-9]+.d +** movprfx z[0-9]+.d, p[0-9]+/m, z[0-9]+.d +** fadd z[0-9]+.d, p[0-9]+/m, z[0-9]+.d, z[0-9]+.d +** ... +*/ +void e1 (double *restrict a, double *restrict b, double *res, int n) +{ + for (int i = 0; i < (n & -4); i+=2) + { + res[i+0] = a[i+0] + b[i+0]; + res[i+1] = a[i+1] - b[i+1]; + } +} --- a/gcc/match.pd +++ b/gcc/match.pd @@ -7612,6 +7612,49 @@ and, (simplify (reduc (op @0 VECTOR_CST@1)) (op (reduc:type @0) (reduc:type @1)))) +/* Simplify vector floating point operations of alternating sub/add pairs + into using an fneg of a wider element type followed by a normal add. + under IEEE 754 the fneg of the wider type will negate every even entry + and when doing an add we get a sub of the even and add of every odd + elements. */ +(simplify + (vec_perm (plus:c @0 @1) (minus @0 @1) VECTOR_CST@2) + (if (!VECTOR_INTEGER_TYPE_P (type) && !BYTES_BIG_ENDIAN) + (with + { + /* Build a vector of integers from the tree mask. */ + vec_perm_builder builder; + if (!tree_to_vec_perm_builder (&builder, @2)) + return NULL_TREE; + + /* Create a vec_perm_indices for the integer vector. */ + poly_uint64 nelts = TYPE_VECTOR_SUBPARTS (type); + vec_perm_indices sel (builder, 2, nelts); + } + (if (sel.series_p (0, 2, 0, 2)) + (with + { + machine_mode vec_mode = TYPE_MODE (type); + auto elem_mode = GET_MODE_INNER (vec_mode); + auto nunits = exact_div (GET_MODE_NUNITS (vec_mode), 2); + tree stype; + switch (elem_mode) + { + case E_HFmode: + stype = float_type_node; + break; + case E_SFmode: + stype = double_type_node; + break; + default: + return NULL_TREE; + } + tree ntype = build_vector_type (stype, nunits); + if (!ntype) + return NULL_TREE; + } + (plus (view_convert:type (negate (view_convert:ntype @1))) @0)))))) + (simplify (vec_perm @0 @1 VECTOR_CST@2) (with diff --git a/gcc/testsuite/gcc.target/aarch64/simd/addsub_1.c b/gcc/testsuite/gcc.target/aarch64/simd/addsub_1.c new file mode 100644 index 0000000000000000000000000000000000000000..1fb91a34c421bbd2894faa0dbbf1b47ad43310c4 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/simd/addsub_1.c @@ -0,0 +1,56 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_v8_2a_fp16_neon_ok } */ +/* { dg-options "-Ofast" } */ +/* { dg-add-options arm_v8_2a_fp16_neon } */ +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */ + +#pragma GCC target "+nosve" + +/* +** f1: +** ... +** fneg v[0-9]+.2d, v[0-9]+.2d +** fadd v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s +** ... +*/ +void f1 (float *restrict a, float *restrict b, float *res, int n) +{ + for (int i = 0; i < (n & -4); i+=2) + { + res[i+0] = a[i+0] + b[i+0]; + res[i+1] = a[i+1] - b[i+1]; + } +} + +/* +** d1: +** ... +** fneg v[0-9]+.4s, v[0-9]+.4s +** fadd v[0-9]+.8h, v[0-9]+.8h, v[0-9]+.8h +** ... +*/ +void d1 (_Float16 *restrict a, _Float16 *restrict b, _Float16 *res, int n) +{ + for (int i = 0; i < (n & -8); i+=2) + { + res[i+0] = a[i+0] + b[i+0]; + res[i+1] = a[i+1] - b[i+1]; + } +} + +/* +** e1: +** ... +** fadd v[0-9]+.2d, v[0-9]+.2d, v[0-9]+.2d +** fsub v[0-9]+.2d, v[0-9]+.2d, v[0-9]+.2d +** ins v[0-9]+.d\[1\], v[0-9]+.d\[1\] +** ... +*/ +void e1 (double *restrict a, double *restrict b, double *res, int n) +{ + for (int i = 0; i < (n & -4); i+=2) + { + res[i+0] = a[i+0] + b[i+0]; + res[i+1] = a[i+1] - b[i+1]; + } +} diff --git a/gcc/testsuite/gcc.target/aarch64/sve/addsub_1.c b/gcc/testsuite/gcc.target/aarch64/sve/addsub_1.c new file mode 100644 index 0000000000000000000000000000000000000000..ea7f9d9db2c8c9a3efe5c7951a314a29b7a7a922 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/addsub_1.c @@ -0,0 +1,52 @@ +/* { dg-do compile } */ +/* { dg-options "-Ofast" } */ +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */ + +/* +** f1: +** ... +** fneg z[0-9]+.d, p[0-9]+/m, z[0-9]+.d +** fadd z[0-9]+.s, z[0-9]+.s, z[0-9]+.s +** ... +*/ +void f1 (float *restrict a, float *restrict b, float *res, int n) +{ + for (int i = 0; i < (n & -4); i+=2) + { + res[i+0] = a[i+0] + b[i+0]; + res[i+1] = a[i+1] - b[i+1]; + } +} + +/* +** d1: +** ... +** fneg z[0-9]+.s, p[0-9]+/m, z[0-9]+.s +** fadd z[0-9]+.h, z[0-9]+.h, z[0-9]+.h +** ... +*/ +void d1 (_Float16 *restrict a, _Float16 *restrict b, _Float16 *res, int n) +{ + for (int i = 0; i < (n & -8); i+=2) + { + res[i+0] = a[i+0] + b[i+0]; + res[i+1] = a[i+1] - b[i+1]; + } +} + +/* +** e1: +** ... +** fsub z[0-9]+.d, z[0-9]+.d, z[0-9]+.d +** movprfx z[0-9]+.d, p[0-9]+/m, z[0-9]+.d +** fadd z[0-9]+.d, p[0-9]+/m, z[0-9]+.d, z[0-9]+.d +** ... +*/ +void e1 (double *restrict a, double *restrict b, double *res, int n) +{ + for (int i = 0; i < (n & -4); i+=2) + { + res[i+0] = a[i+0] + b[i+0]; + res[i+1] = a[i+1] - b[i+1]; + } +}