From patchwork Fri Sep 20 12:44:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joe Ramsay X-Patchwork-Id: 1987902 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=arm.com header.i=@arm.com header.a=rsa-sha256 header.s=selector1 header.b=c+QulyuF; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.a=rsa-sha256 header.s=selector1 header.b=c+QulyuF; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces~incoming=patchwork.ozlabs.org@sourceware.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4X9BwX6Hz2z1xrD for ; Fri, 20 Sep 2024 22:46:52 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B33773857000 for ; Fri, 20 Sep 2024 12:46:50 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from EUR03-DBA-obe.outbound.protection.outlook.com (mail-dbaeur03on20610.outbound.protection.outlook.com [IPv6:2a01:111:f403:260d::610]) by sourceware.org (Postfix) with ESMTPS id 1B3C43858C50 for ; Fri, 20 Sep 2024 12:45:06 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 1B3C43858C50 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 1B3C43858C50 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=2a01:111:f403:260d::610 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1726836309; cv=pass; b=HMr6YvTV32UOziCGMB2y6uH/9ntWR2MzTljHoJlICzfgn5Y4TZbND91TEE/YclLHj64HMoPH5PE+6fbbVNHk9q0N5H3HCRpPYk9ICIFcPXRulXkMcL0ce7TX3Ciote2j6iJuiRImpb4+AYRkP29xqihf1c/YdCL18jDMuyV5hPA= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1726836309; c=relaxed/simple; bh=3ah8fXOCLRtmf79cQal/Cz2UsGhydp/6fgWSNgMZmUo=; h=DKIM-Signature:DKIM-Signature:From:To:Subject:Date:Message-ID: MIME-Version; b=E35CdP/Oi8WY8uw86LosALebwRZ3CBjW8Tt+fYiYC9ri3Gh8m1PAaP7xl5HbQK42Qx5q1RJ/6aqg/3ESyBvvOZ5EB9baOXzFVH5lgfNde4tbtdGloN/nb8cClIk1gnyYgiewWnCoCluifzeseSvtWxpcrrV0CHBX39mnwCwM5h8= ARC-Authentication-Results: i=3; server2.sourceware.org ARC-Seal: i=2; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=pass; b=P/DOoBsi08/pXskmjuCtcuxOe9txBAlr60CCxIPUBxifEoReU3YfFaM/8dFX5BI2cEqexljUEzKH8wngY+R7/HmU9awKgA2ffxOwykZzq4PDT2o82yaAB52YPWl4r+oiMiI1J5im7SOnkOOk5ew+S3Sih4OtgtjfcdCRlOorTJQKNlYPYgGod7EX3JUyjIvJdM1VC2AupzGgfm9sE47UBPSURx/D4lwv6+dvsGyN3D70nrX/TH3drnYN86rU5hG2FyGFpqEjZz2DCqszT1T1f0LatoDtlETKp84pTWRWcU0nt4zXRm+ieAC/8ehce0ixzg+92QFgP0hVbUdwpIEAiA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=8mNWa8LANjGqRrtNJ53tA9McRYTf0OifMt9QyrKtQeM=; b=PdFKycIxX/Bm71wDe+sqIkdd6kSpmMf9Twar1SyrOv3Eg5h2PGx5fKOoIKOohAsT1aqOKbbNuVhIX/MopNBtNxTx05CTfiXa+f3Xk4JTgtPTEEqSpmAq72iUk18ERobEJVpR/yyO1cTVSpymSqbVCIAkixfxksergSvwk4YpFE+hT/pL3mmqg3V27gj5PoR0RzzTiaIdoAmylxpN5xlHWY2d9ruiOiZcNorzOgghvHF7pdsuldoDa8YoyxudiUBwg6+zym0EJblh+/VkdYmG43ehIknJ/S75s5f0XmAsL0nGS50MIjz3ohrJGZNf3wyKHbD2dsleujsyiV1eq9ReIA== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=sourceware.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=arm.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=8mNWa8LANjGqRrtNJ53tA9McRYTf0OifMt9QyrKtQeM=; b=c+QulyuFRNWAcif4lgaMapHRa3LH9H8aFmS9ZVuJDFP0u6DyWS6nekQ0ew7B5DvzqnxjSqr9rLvQUaDnRzxpl9GwEIeJDPL8mw0JX7J/DUdFGbQOIwmvFLFJlbjCMMO3PdpwCcSYSZ7kG3JDFDE8gaZBuZ3ghpq3FYEUe4VIjmM= Received: from DU2PR04CA0211.eurprd04.prod.outlook.com (2603:10a6:10:2b1::6) by AS1PR08MB7515.eurprd08.prod.outlook.com (2603:10a6:20b:482::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8005.7; Fri, 20 Sep 2024 12:44:57 +0000 Received: from DU2PEPF0001E9C6.eurprd03.prod.outlook.com (2603:10a6:10:2b1:cafe::ca) by DU2PR04CA0211.outlook.office365.com (2603:10a6:10:2b1::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7939.30 via Frontend Transport; Fri, 20 Sep 2024 12:44:57 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=arm.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DU2PEPF0001E9C6.mail.protection.outlook.com (10.167.8.75) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.7918.13 via Frontend Transport; Fri, 20 Sep 2024 12:44:54 +0000 Received: ("Tessian outbound f9d5b330f2e0:v465"); Fri, 20 Sep 2024 12:44:54 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 993fff2f63895b62 X-TessianGatewayMetadata: tWNNCdAK+VO204LZh3myFExchHRtP1TF7mS9KKmmUvBC3PhOWhCOMs+CToxDXw84HZvA8XHdvXof5yh5xxM+iyQn1y6PrnXwfCIpud/HysBv8aZMtM2Khx6ghMHFflh7ApqcU4uazNJwhHnXPw7Vm7ZehH34uBrCU8FB9/0wy00= X-CR-MTA-TID: 64aa7808 Received: from L23779db05c55.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 01C95444-959C-4853-BAE8-4A6E280D2CFA.1; Fri, 20 Sep 2024 12:44:48 +0000 Received: from EUR05-AM6-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id L23779db05c55.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Fri, 20 Sep 2024 12:44:48 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=Wjzv0BlHB1ZUMWGcoFGCGKYiBXAyc9c9rtnyR3m0XNxsoMejWU3wU02B+hj1EQr8Cooxn71UrWCGoEr8zt/xWide9tcrCWhIZLObhjfQXBQo2cahyeLRQ/vwubHyHCG/wJ+DSWvMQDlIQ+pQA/W2/9D8451sJrkA6+N/JZRS4IWwlYqj00e3L5Aw4ZHcdKMcczOMSU61tdSRua5IdG9vNWCHzH40TWNViCiyyCb+851MUkTU3lHEQLxbJ1Q2j0S9N427/0wwiVRjCCwEMl3ytNzQQ5dEwyAJ2gexk37cXj1N84v7UBDBqrKqyNpk5VcQC2zM7tlfipJmhdZgSlWZoA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=8mNWa8LANjGqRrtNJ53tA9McRYTf0OifMt9QyrKtQeM=; b=l7wQulRUHSbOBGTnEHOPqRyJL7WlCpwd6wWiHh0l7I7NucCam/zfzA4FZ/NP1D69FJxlEIEaZ7CuUb+7vtJWGZAr+JDmGjI+PK6WdzoyDWJuvDvBT3T76YGCwuO4w5P9C3Ku5nwOqIoYCvIeSFVGC0PPSx5cTXoRW820NZU+Ijxy7WoszW76yfK4pB3/oh3wBiv+xoIu3VaCIzTpfU+hYK2bjsod2bpJiBZLDmv6gKNVQvuL5K6Z/kGW97fq8Kf2TyN9ADRr5qYH/BgB77/GeXmpy5C4M84Zyi722vJ9N1gNkth/7H2y5egD17G8h0g48ax17pUcpAnRqUFSo/lICg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 40.67.248.234) smtp.rcpttodomain=sourceware.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=8mNWa8LANjGqRrtNJ53tA9McRYTf0OifMt9QyrKtQeM=; b=c+QulyuFRNWAcif4lgaMapHRa3LH9H8aFmS9ZVuJDFP0u6DyWS6nekQ0ew7B5DvzqnxjSqr9rLvQUaDnRzxpl9GwEIeJDPL8mw0JX7J/DUdFGbQOIwmvFLFJlbjCMMO3PdpwCcSYSZ7kG3JDFDE8gaZBuZ3ghpq3FYEUe4VIjmM= Received: from AS9PR06CA0511.eurprd06.prod.outlook.com (2603:10a6:20b:49d::6) by AS8PR08MB9885.eurprd08.prod.outlook.com (2603:10a6:20b:5b0::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7982.16; Fri, 20 Sep 2024 12:44:45 +0000 Received: from AM3PEPF0000A798.eurprd04.prod.outlook.com (2603:10a6:20b:49d:cafe::16) by AS9PR06CA0511.outlook.office365.com (2603:10a6:20b:49d::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7962.30 via Frontend Transport; Fri, 20 Sep 2024 12:44:45 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 40.67.248.234) smtp.mailfrom=arm.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 40.67.248.234 as permitted sender) receiver=protection.outlook.com; client-ip=40.67.248.234; helo=nebula.arm.com; pr=C Received: from nebula.arm.com (40.67.248.234) by AM3PEPF0000A798.mail.protection.outlook.com (10.167.16.103) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.7918.13 via Frontend Transport; Fri, 20 Sep 2024 12:44:44 +0000 Received: from AZ-NEU-EX06.Arm.com (10.240.25.134) by AZ-NEU-EX04.Arm.com (10.251.24.32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Fri, 20 Sep 2024 12:44:44 +0000 Received: from AZ-NEU-EX04.Arm.com (10.251.24.32) by AZ-NEU-EX06.Arm.com (10.240.25.134) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Fri, 20 Sep 2024 12:44:43 +0000 Received: from vcn-man-apps.manchester.arm.com (10.32.108.22) by mail.arm.com (10.251.24.32) with Microsoft SMTP Server id 15.1.2507.39 via Frontend Transport; Fri, 20 Sep 2024 12:44:43 +0000 From: Joe Ramsay To: CC: Joe Ramsay Subject: [PATCH] aarch64: Improve codegen in SVE expf & related routines Date: Fri, 20 Sep 2024 13:44:33 +0100 Message-ID: <20240920124437.1908340-1-Joe.Ramsay@arm.com> X-Mailer: git-send-email 2.27.0 MIME-Version: 1.0 X-EOPAttributedMessage: 1 X-MS-TrafficTypeDiagnostic: AM3PEPF0000A798:EE_|AS8PR08MB9885:EE_|DU2PEPF0001E9C6:EE_|AS1PR08MB7515:EE_ X-MS-Office365-Filtering-Correlation-Id: f0a67278-5fec-4b8e-5164-08dcd9720732 x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; ARA:13230040|36860700013|82310400026|376014|1800799024; X-Microsoft-Antispam-Message-Info-Original: sytXhmHMZ0fLAlj7EMsKZ4sqplGzaKftQ4L92ycW61Kqine1ifYJS+3bb4X5X18c6yIzvKPal5Wp2qsuJA/6cNt7O0S2STUq+5EL2PaXHB12hR++qdaBGD6avkoVK36xs4u3/z1pag/RtJI5Vh++7bFkG0oSZWDybAO1c4ynnJ0g3rxiKoAM9xAd5IzGquXNSHXxCjUnsTEPIMFcJa2EyPqFb8sIKJbrtJWI+az0ACEtlRA/C8/3x5Lepvbmn/8gubic3AlBFH2x7pbapCKYKa5SacL7C66pnZ8UNS1nQsE1a4vSx05z+V0XGNBzb/Ybbt2LGO0K92I/uQsfQqJKygGijZoX/BBdsws93+iN9P8ORgcCUfOK9a9h/GLIqvmKxQKm3isdo/n70zTe/bARtbIA6mvcWSxCACfa/9Qw2O5VROm+T1BMCshPlu8Vb681I6ZMP5ml0+GdukGuxRefVC0oBImgmiat3nWLL4lwjnohyDUpmuM/QkDjQhVN4WcWSCv2qW7kAvjxj2DlOJL5CR9HKUo+K5GuJOHNr+R6XYku1Ycf2WH/Is2f1gJp1eAQvFs46YoxsLcMt7eTptbqJwQ9IDIXIlTjVCfVe0Y0w5+aabJ5JrtBlUBacCLRauWSpu4TtHy7PQbfEUwDoytASnYWu5KQWKQ2tFiX91sCl5A26/MuQ48yOWS7McaK0gCjVJuE/9D2u6o+whWiBq8u3o9cjsf/2FK76hEv7R4yfqU+cEBjrXN4CzxaqHN8v/GdR1685jKk7WVOpqDve9XtHf2zE44ue+z/9CP+WS7hoJVnNj4hd1bpzbypdegKT8Qoa31t5kVdtcf0EtoH0UWOcX3Oh4HkGgbMOgDA7RpqPQO4QDNfSngrl4p9BjsqZIEFQbUEcK/ggLPV0jpzfDTHhL3gkTTebSTbQ7e+UydCy7VQ11x/HRN3igV2iFaE/befPqo2KyE8uSrwY9HZ0rDT2lE/jHYgazHUKPNK56rC1poync4MvvJvEtNbVHvZ40LUlhpUjUFr494qY3qTODu9zQZMwQsstPe8XjRAjiJ9ZZonntNQT5BAJmMDVVc8Hpg8nVLsdgIpJiximQP0ebn9Hm0ay41fmIOUKwq6K42MrR9OdJLVibwJplUsxp9dSR1SUNxvzNZrwIJ/GTTlbWqLXltOYdvcibjx+BvocJK361Q+DrBE8TDR8bHeqUlIqGonKGFMYzV3HrQfiv56hDIhz+nnzcNbCq4h8hjBDrl2afCEE03XwPB62ETY3L9dHStpFHBF419KrsgFwso07DVr257DbA2SN1sPvOqTKAeZjS9bc79RvlapCwcnyDSIYIo79CDB/V2MZCasebuZgtlA54P9jFdYxYhT+0hcojTdbKzvmWw/6P1iurVmhtWDJBYP X-Forefront-Antispam-Report-Untrusted: CIP:40.67.248.234; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:nebula.arm.com; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(13230040)(36860700013)(82310400026)(376014)(1800799024); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB9885 X-MS-Exchange-SkipListedInternetSender: ip=[2603:10a6:20b:49d::6]; domain=AS9PR06CA0511.eurprd06.prod.outlook.com X-MS-Exchange-Transport-CrossTenantHeadersStripped: DU2PEPF0001E9C6.eurprd03.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: a9e72dd5-999d-4712-8530-08dcd972013f X-Microsoft-Antispam: BCL:0; ARA:13230040|376014|36860700013|82310400026|1800799024|35042699022; X-Microsoft-Antispam-Message-Info: +nncz2c1Fqu0smSIQ/tITe6i0sZ0SmDFqxR/DyJ6O/O+d8pzDFkHtCe4OuMxK6AgFu80Y1cVwEzEvqEZdmhqHu120tL+4SPpPLFZlFhHSxE56tJql6iAWlOInvNAui1iC3C2nsUlkLui3CerrkK+SBNa81wxZtXPm0nk4i/Oz8w2xEv6SaeHs33J2IRcS19W3Fo9Oy/7GVj71aR4Roknz5QWfLECITwmqcNtAeG0FUEksLJCnxx2vltZ0IH7kqk4svpmLlENj2qq1zyc4FTtK0grCgIWDYHWcsKEKBcgmaPMZeGZjux2vWVAL/LKU9afKKHfwJU7c+9NQW5QBVqYDSUUVygsFQyPifvgesn2OrYSBhoo0Ee5Afb1LLYHOngyNlcgprCFe9Td/tCP0OdlDBErw2cLyfBvxOTiOSx8IU1KrLoKzuyau0qKZGJ2SSJ09KvTUdeAIm6pMxE9TO9Enhfd3/F3Z8VLiqVSxutywAfxLfHh9O1pxs2iwzPwfZysqJXPVWn8tgUGAHo4ELymfrp7HSvfG+hMuNBAPzX8fp689Ser2GyiD2mY6ygHdB5jlEACNzJeqvfxlCATuekM2NNhaBG4Yhn8D+7pQmviizIxXyGNCmpGdvqq1rlFl5K4JjdVpkgQHaUzTeYRlsr/F3ag8az7hRpBVqHC6bifYLhkAMpoRGyLBAKe+VIV/fgqUxu3KdR5+ygC9cFlmTY+t/Z/w56DlLm+BBKiAlBjR3uXjVExkhfu3J+moQ5EXgYGy/29NIu2+8r461ecexgmBW6znrVK7Pgu5FhVWODAd9Sqly8n4iil6laptWP6XDM6VbK/gcaLdwnLYmaKU5gL01MPGtAOGSxScbQ+AfKDTKAJjBzuekq2eCyhdK6LvSy0A/bhHAMuEQLIlltnNK5JRXjPBgNErKOG0OJZ8ipjirBSAvIA3Lq6T68Ow4N0VOoicM7m4oRGGzsFUJ9PcReltFjV9c2O2eDO/LENHlTrcK1X++5adslOrFufbl09jXQxIJe7BeV2mqJi5g/IDKvDLReHun1MoFnHlSV2oV4CwD0K1vJGY418jPkAGxrBb1yJXWIzhtYbfuo7ssWksKoHvg/c+dd8HidFoNGk8/BIUt6GnNmLRhUpmwICExrUJHplQIHvZPi+z5T+5XkkVazI9r+RTV2PD/te1I1oZ42HpBCw8WLuwBJY/gOTihhBvkA8ZVPkmAiwp3WHqd9bsilbna7C/6XCOZFK5EKrNNTQhpIyrkKshtJlecNREB3pflc4s//+PbDeTho1k76afrfqXNq7XpNnf/Xjwzlx5urwzoP1m/rJBS6+RAcGkOOfTkiWbAJbuSrRnqnxEGxiSypPlhIiv8kPZstwj/S1pNwo1ZLGxV9lepUoGbnV4UEX8XAn X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230040)(376014)(36860700013)(82310400026)(1800799024)(35042699022); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 20 Sep 2024 12:44:54.9848 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: f0a67278-5fec-4b8e-5164-08dcd9720732 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DU2PEPF0001E9C6.eurprd03.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS1PR08MB7515 X-Spam-Status: No, score=-13.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~incoming=patchwork.ozlabs.org@sourceware.org Reduce MOV and MOVPRFX by improving special-case handling. Use inline helper to duplicate the entire computation between the special- and non-special case branches, removing the contention for z0 between x and the return value. Also rearrange some MLAs and MLSs - by making the multiplicand the destination we can avoid a MOVPRFX in several cases. Also change which constants go in the vector used for lanewise ops - the last lane is no longer wasted. Spotted that shift was incorrect in exp2f and exp10f, w.r.t. to the comment that explains it. Fixed - worst-case ULP for exp2f moves around but it doesn't change significantly for either routine. Worst-case error for coshf increases due to passing x to exp rather than abs(x) - updated the comment, but does not require regen-ulps. --- OK for master? If so please commit for as I don't have commit rights. Thanks, Joe sysdeps/aarch64/fpu/coshf_sve.c | 35 +++++++----- sysdeps/aarch64/fpu/exp10f_sve.c | 83 +++++++++++++++------------- sysdeps/aarch64/fpu/exp2f_sve.c | 70 +++++++++++++---------- sysdeps/aarch64/fpu/expf_sve.c | 62 +++++---------------- sysdeps/aarch64/fpu/sv_expf_inline.h | 34 ++++++------ 5 files changed, 136 insertions(+), 148 deletions(-) diff --git a/sysdeps/aarch64/fpu/coshf_sve.c b/sysdeps/aarch64/fpu/coshf_sve.c index e5d8a299c6..7ad6efa0fc 100644 --- a/sysdeps/aarch64/fpu/coshf_sve.c +++ b/sysdeps/aarch64/fpu/coshf_sve.c @@ -23,37 +23,42 @@ static const struct data { struct sv_expf_data expf_consts; - uint32_t special_bound; + float special_bound; } data = { .expf_consts = SV_EXPF_DATA, /* 0x1.5a92d8p+6: expf overflows above this, so have to use special case. */ - .special_bound = 0x42ad496c, + .special_bound = 0x1.5a92d8p+6, }; static svfloat32_t NOINLINE -special_case (svfloat32_t x, svfloat32_t y, svbool_t pg) +special_case (svfloat32_t x, svfloat32_t half_e, svfloat32_t half_over_e, + svbool_t pg) { - return sv_call_f32 (coshf, x, y, pg); + return sv_call_f32 (coshf, x, svadd_x (svptrue_b32 (), half_e, half_over_e), + pg); } /* Single-precision vector cosh, using vector expf. - Maximum error is 1.89 ULP: - _ZGVsMxv_coshf (-0x1.65898cp+6) got 0x1.f00aep+127 - want 0x1.f00adcp+127. */ + Maximum error is 2.77 ULP: + _ZGVsMxv_coshf(-0x1.5b38f4p+1) got 0x1.e45946p+2 + want 0x1.e4594cp+2. */ svfloat32_t SV_NAME_F1 (cosh) (svfloat32_t x, svbool_t pg) { const struct data *d = ptr_barrier (&data); - svfloat32_t ax = svabs_x (pg, x); - svbool_t special = svcmpge (pg, svreinterpret_u32 (ax), d->special_bound); + svbool_t special = svacge (pg, x, d->special_bound); - /* Calculate cosh by exp(x) / 2 + exp(-x) / 2. */ - svfloat32_t t = expf_inline (ax, pg, &d->expf_consts); - svfloat32_t half_t = svmul_x (pg, t, 0.5); - svfloat32_t half_over_t = svdivr_x (pg, t, 0.5); + /* Calculate cosh by exp(x) / 2 + exp(-x) / 2. + Note that x is passed to exp here, rather than |x|. This is to avoid using + destructive unary ABS for better register usage. However it means the + routine is not exactly symmetrical, as the exp helper is slightly less + accurate in the negative range. */ + svfloat32_t e = expf_inline (x, pg, &d->expf_consts); + svfloat32_t half_e = svmul_x (svptrue_b32 (), e, 0.5); + svfloat32_t half_over_e = svdivr_x (pg, e, 0.5); if (__glibc_unlikely (svptest_any (pg, special))) - return special_case (x, svadd_x (pg, half_t, half_over_t), special); + return special_case (x, half_e, half_over_e, special); - return svadd_x (pg, half_t, half_over_t); + return svadd_x (svptrue_b32 (), half_e, half_over_e); } diff --git a/sysdeps/aarch64/fpu/exp10f_sve.c b/sysdeps/aarch64/fpu/exp10f_sve.c index e09b2f3b27..8aa3fa9c43 100644 --- a/sysdeps/aarch64/fpu/exp10f_sve.c +++ b/sysdeps/aarch64/fpu/exp10f_sve.c @@ -18,74 +18,83 @@ . */ #include "sv_math.h" -#include "poly_sve_f32.h" -/* For x < -SpecialBound, the result is subnormal and not handled correctly by +/* For x < -Thres, the result is subnormal and not handled correctly by FEXPA. */ -#define SpecialBound 37.9 +#define Thres 37.9 static const struct data { - float poly[5]; - float shift, log10_2, log2_10_hi, log2_10_lo, special_bound; + float log2_10_lo, c0, c2, c4; + float c1, c3, log10_2; + float shift, log2_10_hi, thres; } data = { /* Coefficients generated using Remez algorithm with minimisation of relative error. rel error: 0x1.89dafa3p-24 abs error: 0x1.167d55p-23 in [-log10(2)/2, log10(2)/2] maxerr: 0.52 +0.5 ulp. */ - .poly = { 0x1.26bb16p+1f, 0x1.5350d2p+1f, 0x1.04744ap+1f, 0x1.2d8176p+0f, - 0x1.12b41ap-1f }, + .c0 = 0x1.26bb16p+1f, + .c1 = 0x1.5350d2p+1f, + .c2 = 0x1.04744ap+1f, + .c3 = 0x1.2d8176p+0f, + .c4 = 0x1.12b41ap-1f, /* 1.5*2^17 + 127, a shift value suitable for FEXPA. */ - .shift = 0x1.903f8p17f, + .shift = 0x1.803f8p17f, .log10_2 = 0x1.a934fp+1, .log2_10_hi = 0x1.344136p-2, .log2_10_lo = -0x1.ec10cp-27, - .special_bound = SpecialBound, + .thres = Thres, }; -static svfloat32_t NOINLINE -special_case (svfloat32_t x, svfloat32_t y, svbool_t special) +static inline svfloat32_t +sv_exp10f_inline (svfloat32_t x, const svbool_t pg, const struct data *d) { - return sv_call_f32 (exp10f, x, y, special); -} - -/* Single-precision SVE exp10f routine. Implements the same algorithm - as AdvSIMD exp10f. - Worst case error is 1.02 ULPs. - _ZGVsMxv_exp10f(-0x1.040488p-4) got 0x1.ba5f9ep-1 - want 0x1.ba5f9cp-1. */ -svfloat32_t SV_NAME_F1 (exp10) (svfloat32_t x, const svbool_t pg) -{ - const struct data *d = ptr_barrier (&data); /* exp10(x) = 2^(n/N) * 10^r = 2^n * (1 + poly (r)), with poly(r) in [1/sqrt(2), sqrt(2)] and x = r + n * log10(2) / N, with r in [-log10(2)/2N, log10(2)/2N]. */ - /* Load some constants in quad-word chunks to minimise memory access (last - lane is wasted). */ - svfloat32_t log10_2_and_inv = svld1rq (svptrue_b32 (), &d->log10_2); + svfloat32_t lane_consts = svld1rq (svptrue_b32 (), &d->log2_10_lo); /* n = round(x/(log10(2)/N)). */ svfloat32_t shift = sv_f32 (d->shift); - svfloat32_t z = svmla_lane (shift, x, log10_2_and_inv, 0); - svfloat32_t n = svsub_x (pg, z, shift); + svfloat32_t z = svmad_x (pg, sv_f32 (d->log10_2), x, shift); + svfloat32_t n = svsub_x (svptrue_b32 (), z, shift); /* r = x - n*log10(2)/N. */ - svfloat32_t r = svmls_lane (x, n, log10_2_and_inv, 1); - r = svmls_lane (r, n, log10_2_and_inv, 2); + svfloat32_t r = svmsb_x (pg, sv_f32 (d->log2_10_hi), n, x); + r = svmls_lane (r, n, lane_consts, 0); - svbool_t special = svacgt (pg, x, d->special_bound); svfloat32_t scale = svexpa (svreinterpret_u32 (z)); /* Polynomial evaluation: poly(r) ~ exp10(r)-1. */ - svfloat32_t r2 = svmul_x (pg, r, r); - svfloat32_t poly - = svmla_x (pg, svmul_x (pg, r, d->poly[0]), - sv_pairwise_poly_3_f32_x (pg, r, r2, d->poly + 1), r2); - - if (__glibc_unlikely (svptest_any (pg, special))) - return special_case (x, svmla_x (pg, scale, scale, poly), special); + svfloat32_t p12 = svmla_lane (sv_f32 (d->c1), r, lane_consts, 2); + svfloat32_t p34 = svmla_lane (sv_f32 (d->c3), r, lane_consts, 3); + svfloat32_t r2 = svmul_x (svptrue_b32 (), r, r); + svfloat32_t p14 = svmla_x (pg, p12, p34, r2); + svfloat32_t p0 = svmul_lane (r, lane_consts, 1); + svfloat32_t poly = svmla_x (pg, p0, r2, p14); return svmla_x (pg, scale, scale, poly); } + +static svfloat32_t NOINLINE +special_case (svfloat32_t x, svbool_t special, const struct data *d) +{ + return sv_call_f32 (exp10f, x, sv_exp10f_inline (x, svptrue_b32 (), d), + special); +} + +/* Single-precision SVE exp10f routine. Implements the same algorithm + as AdvSIMD exp10f. + Worst case error is 1.02 ULPs. + _ZGVsMxv_exp10f(-0x1.040488p-4) got 0x1.ba5f9ep-1 + want 0x1.ba5f9cp-1. */ +svfloat32_t SV_NAME_F1 (exp10) (svfloat32_t x, const svbool_t pg) +{ + const struct data *d = ptr_barrier (&data); + svbool_t special = svacgt (pg, x, d->thres); + if (__glibc_unlikely (svptest_any (special, special))) + return special_case (x, special, d); + return sv_exp10f_inline (x, pg, d); +} diff --git a/sysdeps/aarch64/fpu/exp2f_sve.c b/sysdeps/aarch64/fpu/exp2f_sve.c index 8a686e3e05..c6216bed9e 100644 --- a/sysdeps/aarch64/fpu/exp2f_sve.c +++ b/sysdeps/aarch64/fpu/exp2f_sve.c @@ -24,54 +24,64 @@ static const struct data { - float poly[5]; + float c0, c2, c4, c1, c3; float shift, thres; } data = { - /* Coefficients copied from the polynomial in AdvSIMD variant, reversed for - compatibility with polynomial helpers. */ - .poly = { 0x1.62e422p-1f, 0x1.ebf9bcp-3f, 0x1.c6bd32p-5f, 0x1.3ce9e4p-7f, - 0x1.59977ap-10f }, + /* Coefficients copied from the polynomial in AdvSIMD variant. */ + .c0 = 0x1.62e422p-1f, + .c1 = 0x1.ebf9bcp-3f, + .c2 = 0x1.c6bd32p-5f, + .c3 = 0x1.3ce9e4p-7f, + .c4 = 0x1.59977ap-10f, /* 1.5*2^17 + 127. */ - .shift = 0x1.903f8p17f, + .shift = 0x1.803f8p17f, /* Roughly 87.3. For x < -Thres, the result is subnormal and not handled correctly by FEXPA. */ .thres = Thres, }; -static svfloat32_t NOINLINE -special_case (svfloat32_t x, svfloat32_t y, svbool_t special) -{ - return sv_call_f32 (exp2f, x, y, special); -} - -/* Single-precision SVE exp2f routine. Implements the same algorithm - as AdvSIMD exp2f. - Worst case error is 1.04 ULPs. - SV_NAME_F1 (exp2)(0x1.943b9p-1) got 0x1.ba7eb2p+0 - want 0x1.ba7ebp+0. */ -svfloat32_t SV_NAME_F1 (exp2) (svfloat32_t x, const svbool_t pg) +static inline svfloat32_t +sv_exp2f_inline (svfloat32_t x, const svbool_t pg, const struct data *d) { - const struct data *d = ptr_barrier (&data); /* exp2(x) = 2^n (1 + poly(r)), with 1 + poly(r) in [1/sqrt(2),sqrt(2)] x = n + r, with r in [-1/2, 1/2]. */ - svfloat32_t shift = sv_f32 (d->shift); - svfloat32_t z = svadd_x (pg, x, shift); - svfloat32_t n = svsub_x (pg, z, shift); - svfloat32_t r = svsub_x (pg, x, n); + svfloat32_t z = svadd_x (svptrue_b32 (), x, d->shift); + svfloat32_t n = svsub_x (svptrue_b32 (), z, d->shift); + svfloat32_t r = svsub_x (svptrue_b32 (), x, n); - svbool_t special = svacgt (pg, x, d->thres); svfloat32_t scale = svexpa (svreinterpret_u32 (z)); /* Polynomial evaluation: poly(r) ~ exp2(r)-1. Evaluate polynomial use hybrid scheme - offset ESTRIN by 1 for coefficients 1 to 4, and apply most significant coefficient directly. */ - svfloat32_t r2 = svmul_x (pg, r, r); - svfloat32_t p14 = sv_pairwise_poly_3_f32_x (pg, r, r2, d->poly + 1); - svfloat32_t p0 = svmul_x (pg, r, d->poly[0]); + svfloat32_t even_coeffs = svld1rq (svptrue_b32 (), &d->c0); + svfloat32_t r2 = svmul_x (svptrue_b32 (), r, r); + svfloat32_t p12 = svmla_lane (sv_f32 (d->c1), r, even_coeffs, 1); + svfloat32_t p34 = svmla_lane (sv_f32 (d->c3), r, even_coeffs, 2); + svfloat32_t p14 = svmla_x (pg, p12, r2, p34); + svfloat32_t p0 = svmul_lane (r, even_coeffs, 0); svfloat32_t poly = svmla_x (pg, p0, r2, p14); - if (__glibc_unlikely (svptest_any (pg, special))) - return special_case (x, svmla_x (pg, scale, scale, poly), special); - return svmla_x (pg, scale, scale, poly); } + +static svfloat32_t NOINLINE +special_case (svfloat32_t x, svbool_t special, const struct data *d) +{ + return sv_call_f32 (exp2f, x, sv_exp2f_inline (x, svptrue_b32 (), d), + special); +} + +/* Single-precision SVE exp2f routine. Implements the same algorithm + as AdvSIMD exp2f. + Worst case error is 1.04 ULPs. + _ZGVsMxv_exp2f(-0x1.af994ap-3) got 0x1.ba6a66p-1 + want 0x1.ba6a64p-1. */ +svfloat32_t SV_NAME_F1 (exp2) (svfloat32_t x, const svbool_t pg) +{ + const struct data *d = ptr_barrier (&data); + svbool_t special = svacgt (pg, x, d->thres); + if (__glibc_unlikely (svptest_any (special, special))) + return special_case (x, special, d); + return sv_exp2f_inline (x, pg, d); +} diff --git a/sysdeps/aarch64/fpu/expf_sve.c b/sysdeps/aarch64/fpu/expf_sve.c index 3ba79bc4f1..da93e01b87 100644 --- a/sysdeps/aarch64/fpu/expf_sve.c +++ b/sysdeps/aarch64/fpu/expf_sve.c @@ -18,33 +18,25 @@ . */ #include "sv_math.h" +#include "sv_expf_inline.h" + +/* Roughly 87.3. For x < -Thres, the result is subnormal and not handled + correctly by FEXPA. */ +#define Thres 0x1.5d5e2ap+6f static const struct data { - float poly[5]; - float inv_ln2, ln2_hi, ln2_lo, shift, thres; + struct sv_expf_data d; + float thres; } data = { - /* Coefficients copied from the polynomial in AdvSIMD variant, reversed for - compatibility with polynomial helpers. */ - .poly = { 0x1.ffffecp-1f, 0x1.fffdb6p-2f, 0x1.555e66p-3f, 0x1.573e2ep-5f, - 0x1.0e4020p-7f }, - .inv_ln2 = 0x1.715476p+0f, - .ln2_hi = 0x1.62e4p-1f, - .ln2_lo = 0x1.7f7d1cp-20f, - /* 1.5*2^17 + 127. */ - .shift = 0x1.903f8p17f, - /* Roughly 87.3. For x < -Thres, the result is subnormal and not handled - correctly by FEXPA. */ - .thres = 0x1.5d5e2ap+6f, + .d = SV_EXPF_DATA, + .thres = Thres, }; -#define C(i) sv_f32 (d->poly[i]) -#define ExponentBias 0x3f800000 - static svfloat32_t NOINLINE -special_case (svfloat32_t x, svfloat32_t y, svbool_t special) +special_case (svfloat32_t x, svbool_t special, const struct sv_expf_data *d) { - return sv_call_f32 (expf, x, y, special); + return sv_call_f32 (expf, x, expf_inline (x, svptrue_b32 (), d), special); } /* Optimised single-precision SVE exp function. @@ -54,36 +46,8 @@ special_case (svfloat32_t x, svfloat32_t y, svbool_t special) svfloat32_t SV_NAME_F1 (exp) (svfloat32_t x, const svbool_t pg) { const struct data *d = ptr_barrier (&data); - - /* exp(x) = 2^n (1 + poly(r)), with 1 + poly(r) in [1/sqrt(2),sqrt(2)] - x = ln2*n + r, with r in [-ln2/2, ln2/2]. */ - - /* Load some constants in quad-word chunks to minimise memory access (last - lane is wasted). */ - svfloat32_t invln2_and_ln2 = svld1rq (svptrue_b32 (), &d->inv_ln2); - - /* n = round(x/(ln2/N)). */ - svfloat32_t z = svmla_lane (sv_f32 (d->shift), x, invln2_and_ln2, 0); - svfloat32_t n = svsub_x (pg, z, d->shift); - - /* r = x - n*ln2/N. */ - svfloat32_t r = svmls_lane (x, n, invln2_and_ln2, 1); - r = svmls_lane (r, n, invln2_and_ln2, 2); - - /* scale = 2^(n/N). */ svbool_t is_special_case = svacgt (pg, x, d->thres); - svfloat32_t scale = svexpa (svreinterpret_u32 (z)); - - /* y = exp(r) - 1 ~= r + C0 r^2 + C1 r^3 + C2 r^4 + C3 r^5 + C4 r^6. */ - svfloat32_t p12 = svmla_x (pg, C (1), C (2), r); - svfloat32_t p34 = svmla_x (pg, C (3), C (4), r); - svfloat32_t r2 = svmul_x (pg, r, r); - svfloat32_t p14 = svmla_x (pg, p12, p34, r2); - svfloat32_t p0 = svmul_x (pg, r, C (0)); - svfloat32_t poly = svmla_x (pg, p0, r2, p14); - if (__glibc_unlikely (svptest_any (pg, is_special_case))) - return special_case (x, svmla_x (pg, scale, scale, poly), is_special_case); - - return svmla_x (pg, scale, scale, poly); + return special_case (x, is_special_case, &d->d); + return expf_inline (x, pg, &d->d); } diff --git a/sysdeps/aarch64/fpu/sv_expf_inline.h b/sysdeps/aarch64/fpu/sv_expf_inline.h index 23963b5f8e..6166df6553 100644 --- a/sysdeps/aarch64/fpu/sv_expf_inline.h +++ b/sysdeps/aarch64/fpu/sv_expf_inline.h @@ -24,19 +24,20 @@ struct sv_expf_data { - float poly[5]; - float inv_ln2, ln2_hi, ln2_lo, shift; + float c1, c3, inv_ln2; + float ln2_lo, c0, c2, c4; + float ln2_hi, shift; }; /* Coefficients copied from the polynomial in AdvSIMD variant, reversed for compatibility with polynomial helpers. Shift is 1.5*2^17 + 127. */ #define SV_EXPF_DATA \ { \ - .poly = { 0x1.ffffecp-1f, 0x1.fffdb6p-2f, 0x1.555e66p-3f, 0x1.573e2ep-5f, \ - 0x1.0e4020p-7f }, \ - \ - .inv_ln2 = 0x1.715476p+0f, .ln2_hi = 0x1.62e4p-1f, \ - .ln2_lo = 0x1.7f7d1cp-20f, .shift = 0x1.803f8p17f, \ + /* Coefficients copied from the polynomial in AdvSIMD variant. */ \ + .c0 = 0x1.ffffecp-1f, .c1 = 0x1.fffdb6p-2f, .c2 = 0x1.555e66p-3f, \ + .c3 = 0x1.573e2ep-5f, .c4 = 0x1.0e4020p-7f, .inv_ln2 = 0x1.715476p+0f, \ + .ln2_hi = 0x1.62e4p-1f, .ln2_lo = 0x1.7f7d1cp-20f, \ + .shift = 0x1.803f8p17f, \ } #define C(i) sv_f32 (d->poly[i]) @@ -47,26 +48,25 @@ expf_inline (svfloat32_t x, const svbool_t pg, const struct sv_expf_data *d) /* exp(x) = 2^n (1 + poly(r)), with 1 + poly(r) in [1/sqrt(2),sqrt(2)] x = ln2*n + r, with r in [-ln2/2, ln2/2]. */ - /* Load some constants in quad-word chunks to minimise memory access. */ - svfloat32_t c4_invln2_and_ln2 = svld1rq (svptrue_b32 (), &d->poly[4]); + svfloat32_t lane_consts = svld1rq (svptrue_b32 (), &d->ln2_lo); /* n = round(x/(ln2/N)). */ - svfloat32_t z = svmla_lane (sv_f32 (d->shift), x, c4_invln2_and_ln2, 1); + svfloat32_t z = svmad_x (pg, sv_f32 (d->inv_ln2), x, d->shift); svfloat32_t n = svsub_x (pg, z, d->shift); /* r = x - n*ln2/N. */ - svfloat32_t r = svmls_lane (x, n, c4_invln2_and_ln2, 2); - r = svmls_lane (r, n, c4_invln2_and_ln2, 3); + svfloat32_t r = svmsb_x (pg, sv_f32 (d->ln2_hi), n, x); + r = svmls_lane (r, n, lane_consts, 0); /* scale = 2^(n/N). */ - svfloat32_t scale = svexpa (svreinterpret_u32_f32 (z)); + svfloat32_t scale = svexpa (svreinterpret_u32 (z)); /* y = exp(r) - 1 ~= r + C0 r^2 + C1 r^3 + C2 r^4 + C3 r^5 + C4 r^6. */ - svfloat32_t p12 = svmla_x (pg, C (1), C (2), r); - svfloat32_t p34 = svmla_lane (C (3), r, c4_invln2_and_ln2, 0); - svfloat32_t r2 = svmul_f32_x (pg, r, r); + svfloat32_t p12 = svmla_lane (sv_f32 (d->c1), r, lane_consts, 2); + svfloat32_t p34 = svmla_lane (sv_f32 (d->c3), r, lane_consts, 3); + svfloat32_t r2 = svmul_x (svptrue_b32 (), r, r); svfloat32_t p14 = svmla_x (pg, p12, p34, r2); - svfloat32_t p0 = svmul_f32_x (pg, r, C (0)); + svfloat32_t p0 = svmul_lane (r, lane_consts, 1); svfloat32_t poly = svmla_x (pg, p0, r2, p14); return svmla_x (pg, scale, scale, poly);