From patchwork Mon Jun 19 14:56:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joe Ramsay X-Patchwork-Id: 1796671 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256 header.s=default header.b=wKR9fnJj; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4QlCXr2slFz20Wk for ; Tue, 20 Jun 2023 00:57:16 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 55E51385734F for ; Mon, 19 Jun 2023 14:57:14 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 55E51385734F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1687186634; bh=mJUhbyjBkxz1OTvOd+9ODPUiYgR7wcspBsg9ztj3CSY=; h=To:CC:Subject:Date:In-Reply-To:References:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=wKR9fnJjijCrCFvOGYmSCBpxnCncYHdlbpHxLvZdcfo5kZqnFLz4UOqW5W0tdbj7t kiHygP/8Zo+Jxk0hUdYQ20VcIfYj4E4bPjMuUWG+2y8JGSnY2pa67dpiZ+efbyeVEV GoIytlbkCGxb7cctxBE09U07LLzOImWRGaUZ6bCU= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from EUR05-VI1-obe.outbound.protection.outlook.com (mail-vi1eur05on2051.outbound.protection.outlook.com [40.107.21.51]) by sourceware.org (Postfix) with ESMTPS id 1FF473858D39 for ; Mon, 19 Jun 2023 14:56:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 1FF473858D39 Received: from DB7PR05CA0016.eurprd05.prod.outlook.com (2603:10a6:10:36::29) by AS2PR08MB8926.eurprd08.prod.outlook.com (2603:10a6:20b:5f9::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6500.36; Mon, 19 Jun 2023 14:56:51 +0000 Received: from DBAEUR03FT029.eop-EUR03.prod.protection.outlook.com (2603:10a6:10:36:cafe::f2) by DB7PR05CA0016.outlook.office365.com (2603:10a6:10:36::29) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6500.36 via Frontend Transport; Mon, 19 Jun 2023 14:56:51 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DBAEUR03FT029.mail.protection.outlook.com (100.127.142.181) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6521.20 via Frontend Transport; Mon, 19 Jun 2023 14:56:50 +0000 Received: ("Tessian outbound 5154e9d36775:v136"); Mon, 19 Jun 2023 14:56:50 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 3e69ff00086046d4 X-CR-MTA-TID: 64aa7808 Received: from f7dc89fe6136.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 04313383-B027-4656-807D-D52D98D8EACB.1; Mon, 19 Jun 2023 14:56:43 +0000 Received: from EUR01-VE1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id f7dc89fe6136.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Mon, 19 Jun 2023 14:56:43 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=hZ97DB1a6IVquMsQ4CFKBZm1XoKZ1RcpuPAaHmKtwj0HE9+rTy+A9GwQgFhfCJ3dXPJDhAEOvGVMf14xFJWd+OJ41Peh5OVU+axGz2VwoHssZj+Kuth5BTFz4ldmxXVFJLANFOwzWZYR6m3pxrgdlODuXNekTfEVips45UjhBouEhVidGmntjYXeuie5hbXOloBiWdMYXXFz/Tr4HgJ6vC3gcMNFdntKT7imLbZX5OllWc2u54QbTVHDtFYriVTQrKNmcsQhIwtmpPnsOPOWjr16VX9w1GNVAJeWBJWrJERACFnwLqTUYlXY1seqvy4or34HkhK63TiC5Kvps34Ikw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=mJUhbyjBkxz1OTvOd+9ODPUiYgR7wcspBsg9ztj3CSY=; b=jRTygGP8EiBfqGkGLL2OHW/I26U3hJQxh3GZFlDmmXqO3akxckcr7wHxzCaQw4WgRKcIF2bx6Iy0Ek/VRvdwIuiat1HCmwYKo5N4jYtIabk+3uT4fa+q7tmNSFDvFt8eoD9ZXrBsrxam/832n1tKFG+EkCI4GdQyf0rCfZ8cTmH8/39ogn+VBPp16Zwo2UQjTYsK6OPvx4To+wnyCtFIOL9dSxrdz95yJUelcROZNSTnN3XsG9ULpuzx25MaWBZyCn+dYXJnO0XcV8DV3JbwCqzTUlpRwmQzZOiRuvUuyZWp0ly3XqJfow0lgGslqmUsYI4+D13td4+R5Lco1EVl3Q== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 40.67.248.234) smtp.rcpttodomain=sourceware.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=none (message not signed); arc=none Received: from AM0PR10CA0116.EURPRD10.PROD.OUTLOOK.COM (2603:10a6:208:e6::33) by DU0PR08MB7392.eurprd08.prod.outlook.com (2603:10a6:10:353::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6500.36; Mon, 19 Jun 2023 14:56:40 +0000 Received: from AM7EUR03FT065.eop-EUR03.prod.protection.outlook.com (2603:10a6:208:e6:cafe::ed) by AM0PR10CA0116.outlook.office365.com (2603:10a6:208:e6::33) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6500.36 via Frontend Transport; Mon, 19 Jun 2023 14:56:40 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 40.67.248.234) smtp.mailfrom=arm.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 40.67.248.234 as permitted sender) receiver=protection.outlook.com; client-ip=40.67.248.234; helo=nebula.arm.com; pr=C Received: from nebula.arm.com (40.67.248.234) by AM7EUR03FT065.mail.protection.outlook.com (100.127.140.250) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.6477.40 via Frontend Transport; Mon, 19 Jun 2023 14:56:40 +0000 Received: from AZ-NEU-EX04.Arm.com (10.251.24.32) by AZ-NEU-EX03.Arm.com (10.251.24.31) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23; Mon, 19 Jun 2023 14:56:37 +0000 Received: from vcn-man-apps.manchester.arm.com (10.32.108.22) by mail.arm.com (10.251.24.32) with Microsoft SMTP Server id 15.1.2507.23 via Frontend Transport; Mon, 19 Jun 2023 14:56:37 +0000 To: CC: Joe Ramsay Subject: [PATCH v2 2/4] aarch64: Add vector implementations of sin routines Date: Mon, 19 Jun 2023 15:56:32 +0100 Message-ID: <20230619145634.18801-2-Joe.Ramsay@arm.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20230619145634.18801-1-Joe.Ramsay@arm.com> References: <20230619145634.18801-1-Joe.Ramsay@arm.com> MIME-Version: 1.0 X-EOPAttributedMessage: 1 X-MS-TrafficTypeDiagnostic: AM7EUR03FT065:EE_|DU0PR08MB7392:EE_|DBAEUR03FT029:EE_|AS2PR08MB8926:EE_ X-MS-Office365-Filtering-Correlation-Id: 87f05e24-b545-4549-4385-08db70d569db x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: 0BzAyuz4+Crg3kHIQJYXVKPnDnxIkh1/hPNB3xtTskMdHBmS3pBNmxBi33x2RR3kwqjS5xyIzgd4iwMOb3n6PXwDkLqyFxH/EfEt7occJ6T6wI6W72ESoczkLKh0H4M29T3LWJIw70UAoXoompcURUAs7A7aoXqMxrhIrAt2CpkyBdcHn7twqA+5Wdt4uaXeIJ6/J0I2qwJnG0Gn4MvQfQBs0qm8/TScPFXCUqqNrMSGEEIvj9J8VBYYohiRVPBVczdGhxfwv2qRF+kUnKN57hE7WkajehHivQLxRCflpZJF29g7Bn46lGcAdE/5x5BVGdW3ScMac0io6w+EYfPKFpce6rTtmtCNETVeOcGix5Nfmkqw4biQczMcdC3vfpdYaRioy8ulZDRTf8z7+cZyPwuMuSNOxelNv41b+hSZIZMLdqsZrgxBQI3nQP/d2raE6X1bTw2oFywVsqeUOMwh9dbuZsSNTc1CVdZwh4ylntcKH+fsEozlJX1Ah8CecEGSzyjsRRDCxlwtqSfY2h/7G4B+AyAeP44fRtngJDl85W/QQc3kl1zjX7mTMQbAhNQxbfqOZnndx3C7Fc79zcn9+FaZ/TsPO+vQR5zo51ADRfPcMQulWAbu3F1Y9s13LdxTqlZYLhqzL7aL3TUw23C8W+Wcg+L2GsBneqgAAlayR56RhfyuCVokgItfsUms1KMp8JbM2WGw1K374OtvQ1ZQuKW/ej20giqS0jSMTV7fWnSYqHdAhIGEMDl4KUyc13BglLstlJC+EgCjlliZul0jrg== X-Forefront-Antispam-Report-Untrusted: CIP:40.67.248.234; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:nebula.arm.com; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(13230028)(4636009)(376002)(396003)(39860400002)(346002)(136003)(451199021)(36840700001)(40470700004)(46966006)(478600001)(70206006)(70586007)(6916009)(8676002)(36756003)(4326008)(41300700001)(81166007)(47076005)(356005)(40480700001)(2616005)(426003)(83380400001)(336012)(82310400005)(86362001)(7696005)(5660300002)(6666004)(2906002)(316002)(8936002)(30864003)(40460700003)(1076003)(26005)(186003)(82740400003)(36860700001)(2004002)(36900700001); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: DU0PR08MB7392 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DBAEUR03FT029.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: a5a486b7-831b-4e7a-1968-08db70d56368 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: WIjpFBcsDgZR4mQrIvjkTKif1Dy+x09fisaGKIQjgp40EbhouVkwLqqfX2TP3pFaFubHE5wlLuOV+8gn+9HaHp5r9to/zfq4iQSEhyvOnauCxnJ67RyEet91A/7tUFusW/2w9j8K/zUOAx7rpX+mzvdwdiSri1lCVEzEWNQWzvykP/dSTfIxloMrMhmAKKzkEbcpsw20DSU72VRp/pjeMCtaB+q9mmWsfYWyQ9B3m1cx/9RUTCCbZSCEwJ/uOlZMtvMZKJ6wM+cmx0W67R4GzQ0VFCiCRie6yUtV1jSMR0fSIndhkXdCuVK1MFqC6G9b86Hwaspx7TYQp7UUbNH42LqDz0ZTUHBzlIo0lcoZc/n+QPOOZ9g81tk1FHMaHs8XsgmNnB4ZPjDfYtFKKrpZ2hBMnJfwtEhniG5X5/sgUx+lz+BpO8MI6OQH7+v5r5Lo9hRNY5N4RWGjcLpCHUcnyrmc8fxBMig8q21WkO8UGSHDIWGfbjpSSv3Eba1HmVB93vr42NtFmhhFGLVmhGJxXfCwKnIz4UDPE7vWXtHlazYbFodsxdHXOMqBliYAEaJuX1L/Gfqs+EHLFeKvocxFfl1rMta597BT7X4xvxvrQJIgLvP9Fx711AS5RwQeQP0xJk5bY19cr45CMMAz8WBf7+vriHHmkNjOT4djhREmOK+LVyM07coTZXFas2o+ipZB9I7gxcyha0ax6JDHnffbOv9XeQW6UmBrhgJpduW3+l0jBIztyMNpQOkG2AZNHo9e X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230028)(4636009)(136003)(346002)(396003)(39860400002)(376002)(451199021)(40470700004)(46966006)(36840700001)(81166007)(336012)(426003)(2616005)(86362001)(47076005)(30864003)(2906002)(36860700001)(83380400001)(82740400003)(5660300002)(40460700003)(8936002)(70586007)(186003)(70206006)(8676002)(36756003)(26005)(41300700001)(478600001)(4326008)(6916009)(40480700001)(1076003)(7696005)(6666004)(82310400005)(316002)(2004002); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 19 Jun 2023 14:56:50.9223 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 87f05e24-b545-4549-4385-08db70d569db X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DBAEUR03FT029.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS2PR08MB8926 X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Joe Ramsay via Libc-alpha From: Joe Ramsay Reply-To: Joe Ramsay Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org Sender: "Libc-alpha" Optimised implementations for single and double precision, Advanced SIMD and SVE, copied from Arm Optimized Routines. Also allow certain tests to be skipped for mathvec routines, for example both AdvSIMD algorithms discard the sign of 0. As previously, data tables are marked volatile or writable to prevent overly aggressive constant inlining. Special-case handlers are marked NOINLINE to avoid incurring the penalty of switching call standards unnecessarily. --- Changes to v1: * Use __glibc_unlikely * Remove polynomial helper macros, as they do not improve code for small polynomials * Explain data table storage math/auto-libm-test-out-sin | 4 +- math/gen-libm-test.py | 3 +- sysdeps/aarch64/fpu/Makefile | 8 +- sysdeps/aarch64/fpu/Versions | 4 + sysdeps/aarch64/fpu/bits/math-vector.h | 6 ++ sysdeps/aarch64/fpu/sin_advsimd.c | 100 ++++++++++++++++++ sysdeps/aarch64/fpu/sin_sve.c | 96 +++++++++++++++++ sysdeps/aarch64/fpu/sinf_advsimd.c | 93 ++++++++++++++++ sysdeps/aarch64/fpu/sinf_sve.c | 94 ++++++++++++++++ .../fpu/test-double-advsimd-wrappers.c | 1 + .../aarch64/fpu/test-double-sve-wrappers.c | 1 + .../aarch64/fpu/test-float-advsimd-wrappers.c | 1 + sysdeps/aarch64/fpu/test-float-sve-wrappers.c | 1 + sysdeps/aarch64/libm-test-ulps | 8 ++ .../unix/sysv/linux/aarch64/libmvec.abilist | 4 + 15 files changed, 417 insertions(+), 7 deletions(-) create mode 100644 sysdeps/aarch64/fpu/sin_advsimd.c create mode 100644 sysdeps/aarch64/fpu/sin_sve.c create mode 100644 sysdeps/aarch64/fpu/sinf_advsimd.c create mode 100644 sysdeps/aarch64/fpu/sinf_sve.c diff --git a/math/auto-libm-test-out-sin b/math/auto-libm-test-out-sin index f1d21b179c..27ccaff1aa 100644 --- a/math/auto-libm-test-out-sin +++ b/math/auto-libm-test-out-sin @@ -25,11 +25,11 @@ sin 0 = sin upward ibm128 0x0p+0 : 0x0p+0 : inexact-ok sin -0 = sin downward binary32 -0x0p+0 : -0x0p+0 : inexact-ok -= sin tonearest binary32 -0x0p+0 : -0x0p+0 : inexact-ok += sin tonearest binary32 -0x0p+0 : -0x0p+0 : inexact-ok no-mathvec = sin towardzero binary32 -0x0p+0 : -0x0p+0 : inexact-ok = sin upward binary32 -0x0p+0 : -0x0p+0 : inexact-ok = sin downward binary64 -0x0p+0 : -0x0p+0 : inexact-ok -= sin tonearest binary64 -0x0p+0 : -0x0p+0 : inexact-ok += sin tonearest binary64 -0x0p+0 : -0x0p+0 : inexact-ok no-mathvec = sin towardzero binary64 -0x0p+0 : -0x0p+0 : inexact-ok = sin upward binary64 -0x0p+0 : -0x0p+0 : inexact-ok = sin downward intel96 -0x0p+0 : -0x0p+0 : inexact-ok diff --git a/math/gen-libm-test.py b/math/gen-libm-test.py index 6ae78beb01..a573c3b8cb 100755 --- a/math/gen-libm-test.py +++ b/math/gen-libm-test.py @@ -93,7 +93,8 @@ BEAUTIFY_MAP = {'minus_zero': '-0', # Flags in auto-libm-test-out that map directly to C flags. FLAGS_SIMPLE = {'ignore-zero-inf-sign': 'IGNORE_ZERO_INF_SIGN', - 'xfail': 'XFAIL_TEST'} + 'xfail': 'XFAIL_TEST', + 'no-mathvec': 'NO_TEST_MATHVEC'} # Exceptions in auto-libm-test-out, and their corresponding C flags # for being required, OK or required to be absent. diff --git a/sysdeps/aarch64/fpu/Makefile b/sysdeps/aarch64/fpu/Makefile index 850cfb9012..b3285542ea 100644 --- a/sysdeps/aarch64/fpu/Makefile +++ b/sysdeps/aarch64/fpu/Makefile @@ -1,10 +1,10 @@ -float-advsimd-funcs = cos +float-advsimd-funcs = cos sin -double-advsimd-funcs = cos +double-advsimd-funcs = cos sin -float-sve-funcs = cos +float-sve-funcs = cos sin -double-sve-funcs = cos +double-sve-funcs = cos sin ifeq ($(subdir),mathvec) libmvec-support = $(addsuffix f_advsimd,$(float-advsimd-funcs)) \ diff --git a/sysdeps/aarch64/fpu/Versions b/sysdeps/aarch64/fpu/Versions index 5222a6f180..d26b3968a9 100644 --- a/sysdeps/aarch64/fpu/Versions +++ b/sysdeps/aarch64/fpu/Versions @@ -1,8 +1,12 @@ libmvec { GLIBC_2.38 { _ZGVnN2v_cos; + _ZGVnN2v_sin; _ZGVnN4v_cosf; + _ZGVnN4v_sinf; _ZGVsMxv_cos; _ZGVsMxv_cosf; + _ZGVsMxv_sin; + _ZGVsMxv_sinf; } } diff --git a/sysdeps/aarch64/fpu/bits/math-vector.h b/sysdeps/aarch64/fpu/bits/math-vector.h index a2f2277591..ad9c9945e8 100644 --- a/sysdeps/aarch64/fpu/bits/math-vector.h +++ b/sysdeps/aarch64/fpu/bits/math-vector.h @@ -50,7 +50,10 @@ typedef __SVBool_t __sv_bool_t; # define __vpcs __attribute__ ((__aarch64_vector_pcs__)) __vpcs __f32x4_t _ZGVnN4v_cosf (__f32x4_t); +__vpcs __f32x4_t _ZGVnN4v_sinf (__f32x4_t); + __vpcs __f64x2_t _ZGVnN2v_cos (__f64x2_t); +__vpcs __f64x2_t _ZGVnN2v_sin (__f64x2_t); # undef __ADVSIMD_VEC_MATH_SUPPORTED #endif /* __ADVSIMD_VEC_MATH_SUPPORTED */ @@ -58,7 +61,10 @@ __vpcs __f64x2_t _ZGVnN2v_cos (__f64x2_t); #ifdef __SVE_VEC_MATH_SUPPORTED __sv_f32_t _ZGVsMxv_cosf (__sv_f32_t, __sv_bool_t); +__sv_f32_t _ZGVsMxv_sinf (__sv_f32_t, __sv_bool_t); + __sv_f64_t _ZGVsMxv_cos (__sv_f64_t, __sv_bool_t); +__sv_f64_t _ZGVsMxv_sin (__sv_f64_t, __sv_bool_t); # undef __SVE_VEC_MATH_SUPPORTED #endif /* __SVE_VEC_MATH_SUPPORTED */ diff --git a/sysdeps/aarch64/fpu/sin_advsimd.c b/sysdeps/aarch64/fpu/sin_advsimd.c new file mode 100644 index 0000000000..b64129b57b --- /dev/null +++ b/sysdeps/aarch64/fpu/sin_advsimd.c @@ -0,0 +1,100 @@ +/* Double-precision vector (Advanced SIMD) sin function. + + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include "v_math.h" + +static const volatile struct +{ + float64x2_t poly[7]; + float64x2_t range_val, inv_pi, shift, pi_1, pi_2, pi_3; +} data = { + /* Worst-case error is 2.8 ulp in [-pi/2, pi/2]. */ + .poly = { V2 (-0x1.555555555547bp-3), V2 (0x1.1111111108a4dp-7), + V2 (-0x1.a01a019936f27p-13), V2 (0x1.71de37a97d93ep-19), + V2 (-0x1.ae633919987c6p-26), V2 (0x1.60e277ae07cecp-33), + V2 (-0x1.9e9540300a1p-41) }, + + .range_val = V2 (0x1p23), + .inv_pi = V2 (0x1.45f306dc9c883p-2), + .pi_1 = V2 (0x1.921fb54442d18p+1), + .pi_2 = V2 (0x1.1a62633145c06p-53), + .pi_3 = V2 (0x1.c1cd129024e09p-106), + .shift = V2 (0x1.8p52), +}; + +#if WANT_SIMD_EXCEPT +# define TinyBound v_u64 (0x3000000000000000) /* asuint64 (0x1p-255). */ +# define Thresh v_u64 (0x1160000000000000) /* RangeVal - TinyBound. */ +#endif + +#define C(i) data.poly[i] + +static float64x2_t VPCS_ATTR NOINLINE +special_case (float64x2_t x, float64x2_t y, uint64x2_t odd, uint64x2_t cmp) +{ + y = vreinterpretq_f64_u64 (veorq_u64 (vreinterpretq_u64_f64 (y), odd)); + return v_call_f64 (sin, x, y, cmp); +} + +float64x2_t VPCS_ATTR V_NAME_D1 (sin) (float64x2_t x) +{ + float64x2_t n, r, r2, r3, r4, y, t1, t2, t3; + uint64x2_t odd, cmp; + +#if WANT_SIMD_EXCEPT + /* Detect |x| <= TinyBound or |x| >= RangeVal. If fenv exceptions are to be + triggered correctly, set any special lanes to 1 (which is neutral w.r.t. + fenv). These lanes will be fixed by special-case handler later. */ + uint64x2_t ir = vreinterpretq_u64_f64 (vabsq_f64 (x)); + cmp = vcgeq_u64 (vsubq_u64 (ir, TinyBound), Thresh); + r = vbslq_f64 (cmp, vreinterpretq_f64_u64 (cmp), x); +#else + r = x; + cmp = vcageq_f64 (data.range_val, x); + cmp = vceqzq_u64 (cmp); /* cmp = ~cmp. */ +#endif + + /* n = rint(|x|/pi). */ + n = vfmaq_f64 (data.shift, data.inv_pi, r); + odd = vshlq_n_u64 (vreinterpretq_u64_f64 (n), 63); + n = vsubq_f64 (n, data.shift); + + /* r = |x| - n*pi (range reduction into -pi/2 .. pi/2). */ + r = vfmsq_f64 (r, data.pi_1, n); + r = vfmsq_f64 (r, data.pi_2, n); + r = vfmsq_f64 (r, data.pi_3, n); + + /* sin(r) poly approx. */ + r2 = vmulq_f64 (r, r); + r3 = vmulq_f64 (r2, r); + r4 = vmulq_f64 (r2, r2); + + t1 = vfmaq_f64 (C (4), C (5), r2); + t2 = vfmaq_f64 (C (2), C (3), r2); + t3 = vfmaq_f64 (C (0), C (1), r2); + + y = vfmaq_f64 (t1, C (6), r4); + y = vfmaq_f64 (t2, y, r4); + y = vfmaq_f64 (t3, y, r4); + y = vfmaq_f64 (r, y, r3); + + if (__glibc_unlikely (v_any_u64 (cmp))) + return special_case (x, y, odd, cmp); + return vreinterpretq_f64_u64 (veorq_u64 (vreinterpretq_u64_f64 (y), odd)); +} diff --git a/sysdeps/aarch64/fpu/sin_sve.c b/sysdeps/aarch64/fpu/sin_sve.c new file mode 100644 index 0000000000..482fc326ba --- /dev/null +++ b/sysdeps/aarch64/fpu/sin_sve.c @@ -0,0 +1,96 @@ +/* Double-precision vector (SVE) sin function. + + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include "sv_math.h" + +static struct +{ + double inv_pi, half_pi, inv_pi_over_2, pi_over_2_1, pi_over_2_2, pi_over_2_3, + shift; +} data = { + /* Polynomial coefficients are hard-wired in the FTMAD instruction. */ + .inv_pi = 0x1.45f306dc9c883p-2, + .half_pi = 0x1.921fb54442d18p+0, + .inv_pi_over_2 = 0x1.45f306dc9c882p-1, + .pi_over_2_1 = 0x1.921fb50000000p+0, + .pi_over_2_2 = 0x1.110b460000000p-26, + .pi_over_2_3 = 0x1.1a62633145c07p-54, + .shift = 0x1.8p52 +}; + +#define RangeVal 0x4160000000000000 /* asuint64 (0x1p23). */ + +static svfloat64_t NOINLINE +special_case (svfloat64_t x, svfloat64_t y, svbool_t cmp) +{ + return sv_call_f64 (sin, x, y, cmp); +} + +/* A fast SVE implementation of sin based on trigonometric + instructions (FTMAD, FTSSEL, FTSMUL). + Maximum observed error in 2.52 ULP: + SV_NAME_D1 (sin)(0x1.2d2b00df69661p+19) got 0x1.10ace8f3e786bp-40 + want 0x1.10ace8f3e7868p-40. */ +svfloat64_t SV_NAME_D1 (sin) (svfloat64_t x, const svbool_t pg) +{ + svfloat64_t r = svabs_f64_x (pg, x); + svuint64_t sign + = sveor_u64_x (pg, svreinterpret_u64_f64 (x), svreinterpret_u64_f64 (r)); + svbool_t cmp = svcmpge_n_u64 (pg, svreinterpret_u64_f64 (r), RangeVal); + + /* Load first two pio2-related constants to one vector. */ + svfloat64_t invpio2_and_pio2_1 + = svld1rq_f64 (svptrue_b64 (), &data.inv_pi_over_2); + + /* n = rint(|x|/(pi/2)). */ + svfloat64_t q + = svmla_lane_f64 (sv_f64 (data.shift), r, invpio2_and_pio2_1, 0); + svfloat64_t n = svsub_n_f64_x (pg, q, data.shift); + + /* r = |x| - n*(pi/2) (range reduction into -pi/4 .. pi/4). */ + r = svmls_lane_f64 (r, n, invpio2_and_pio2_1, 1); + r = svmls_n_f64_x (pg, r, n, data.pi_over_2_2); + r = svmls_n_f64_x (pg, r, n, data.pi_over_2_3); + + /* Final multiplicative factor: 1.0 or x depending on bit #0 of q. */ + svfloat64_t f = svtssel_f64 (r, svreinterpret_u64_f64 (q)); + + /* sin(r) poly approx. */ + svfloat64_t r2 = svtsmul_f64 (r, svreinterpret_u64_f64 (q)); + svfloat64_t y = sv_f64 (0.0); + y = svtmad_f64 (y, r2, 7); + y = svtmad_f64 (y, r2, 6); + y = svtmad_f64 (y, r2, 5); + y = svtmad_f64 (y, r2, 4); + y = svtmad_f64 (y, r2, 3); + y = svtmad_f64 (y, r2, 2); + y = svtmad_f64 (y, r2, 1); + y = svtmad_f64 (y, r2, 0); + + /* Apply factor. */ + y = svmul_f64_x (pg, f, y); + + /* sign = y^sign. */ + y = svreinterpret_f64_u64 ( + sveor_u64_x (pg, svreinterpret_u64_f64 (y), sign)); + + if (__glibc_unlikely (svptest_any (pg, cmp))) + return special_case (x, y, cmp); + return y; +} diff --git a/sysdeps/aarch64/fpu/sinf_advsimd.c b/sysdeps/aarch64/fpu/sinf_advsimd.c new file mode 100644 index 0000000000..df45d99737 --- /dev/null +++ b/sysdeps/aarch64/fpu/sinf_advsimd.c @@ -0,0 +1,93 @@ +/* Single-precision vector (Advanced SIMD) sin function. + + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include "v_math.h" + +static const volatile struct +{ + float32x4_t poly[4]; + float32x4_t range_val, inv_pi, shift, pi_1, pi_2, pi_3; +} data = { + /* 1.886 ulp error. */ + .poly = { V4 (-0x1.555548p-3f), V4 (0x1.110df4p-7f), V4 (-0x1.9f42eap-13f), + V4 (0x1.5b2e76p-19f) }, + + .pi_1 = V4 (0x1.921fb6p+1f), + .pi_2 = V4 (-0x1.777a5cp-24f), + .pi_3 = V4 (-0x1.ee59dap-49f), + + .inv_pi = V4 (0x1.45f306p-2f), + .shift = V4 (0x1.8p+23f), + .range_val = V4 (0x1p20f) +}; + +#if WANT_SIMD_EXCEPT +# define TinyBound v_u32 (0x21000000) /* asuint32(0x1p-61f). */ +# define Thresh v_u32 (0x28800000) /* RangeVal - TinyBound. */ +#endif + +#define C(i) data.poly[i] + +static float32x4_t VPCS_ATTR NOINLINE +special_case (float32x4_t x, float32x4_t y, uint32x4_t odd, uint32x4_t cmp) +{ + /* Fall back to scalar code. */ + y = vreinterpretq_f32_u32 (veorq_u32 (vreinterpretq_u32_f32 (y), odd)); + return v_call_f32 (sinf, x, y, cmp); +} + +float32x4_t VPCS_ATTR V_NAME_F1 (sin) (float32x4_t x) +{ + float32x4_t n, r, r2, y; + uint32x4_t odd, cmp; + +#if WANT_SIMD_EXCEPT + uint32x4_t ir = vreinterpretq_u32_f32 (vabsq_f32 (x)); + cmp = vcgeq_u32 (vsubq_u32 (ir, TinyBound), Thresh); + /* If fenv exceptions are to be triggered correctly, set any special lanes + to 1 (which is neutral w.r.t. fenv). These lanes will be fixed by + special-case handler later. */ + r = vbslq_f32 (cmp, vreinterpretq_f32_u32 (cmp), x); +#else + r = x; + cmp = vcageq_f32 (data.range_val, x); + cmp = vceqzq_u32 (cmp); /* cmp = ~cmp. */ +#endif + + /* n = rint(|x|/pi) */ + n = vfmaq_f32 (data.shift, data.inv_pi, r); + odd = vshlq_n_u32 (vreinterpretq_u32_f32 (n), 31); + n = vsubq_f32 (n, data.shift); + + /* r = |x| - n*pi (range reduction into -pi/2 .. pi/2) */ + r = vfmsq_f32 (r, data.pi_1, n); + r = vfmsq_f32 (r, data.pi_2, n); + r = vfmsq_f32 (r, data.pi_3, n); + + /* y = sin(r) */ + r2 = vmulq_f32 (r, r); + y = vfmaq_f32 (C (2), C (3), r2); + y = vfmaq_f32 (C (1), y, r2); + y = vfmaq_f32 (C (0), y, r2); + y = vfmaq_f32 (r, vmulq_f32 (y, r2), r); + + if (__glibc_unlikely (v_any_u32 (cmp))) + return special_case (x, y, odd, cmp); + return vreinterpretq_f32_u32 (veorq_u32 (vreinterpretq_u32_f32 (y), odd)); +} diff --git a/sysdeps/aarch64/fpu/sinf_sve.c b/sysdeps/aarch64/fpu/sinf_sve.c new file mode 100644 index 0000000000..54df1aa860 --- /dev/null +++ b/sysdeps/aarch64/fpu/sinf_sve.c @@ -0,0 +1,94 @@ +/* Single-precision vector (SVE) sin function. + + Copyright (C) 2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include "sv_math.h" + +static struct +{ + float poly[4]; + /* Pi-related values to be loaded as one quad-word and used with + svmla_lane_f32. */ + float negpi1, negpi2, negpi3, invpi; + float shift; +} data = { + .poly = { + /* Non-zero coefficients from the degree 9 Taylor series expansion of + sin. */ + -0x1.555548p-3f, 0x1.110df4p-7f, -0x1.9f42eap-13f, 0x1.5b2e76p-19f + }, + .negpi1 = -0x1.921fb6p+1f, + .negpi2 = 0x1.777a5cp-24f, + .negpi3 = 0x1.ee59dap-49f, + .invpi = 0x1.45f306p-2f, + .shift = 0x1.8p+23f +}; + +#define RangeVal 0x49800000 /* asuint32 (0x1p20f). */ +#define C(i) sv_f32 (data.poly[i]) + +static svfloat32_t NOINLINE +special_case (svfloat32_t x, svfloat32_t y, svbool_t cmp) +{ + return sv_call_f32 (sinf, x, y, cmp); +} + +/* A fast SVE implementation of sinf. + Maximum error: 1.89 ULPs. + This maximum error is achieved at multiple values in [-2^18, 2^18] + but one example is: + SV_NAME_F1 (sin)(0x1.9247a4p+0) got 0x1.fffff6p-1 want 0x1.fffffap-1. */ +svfloat32_t SV_NAME_F1 (sin) (svfloat32_t x, const svbool_t pg) +{ + svfloat32_t ax = svabs_f32_x (pg, x); + svuint32_t sign = sveor_u32_x (pg, svreinterpret_u32_f32 (x), + svreinterpret_u32_f32 (ax)); + svbool_t cmp = svcmpge_n_u32 (pg, svreinterpret_u32_f32 (ax), RangeVal); + + /* pi_vals are a quad-word of helper values - the first 3 elements contain + -pi in extended precision, the last contains 1 / pi. */ + svfloat32_t pi_vals = svld1rq_f32 (svptrue_b32 (), &data.negpi1); + + /* n = rint(|x|/pi). */ + svfloat32_t n = svmla_lane_f32 (sv_f32 (data.shift), ax, pi_vals, 3); + svuint32_t odd = svlsl_n_u32_x (pg, svreinterpret_u32_f32 (n), 31); + n = svsub_n_f32_x (pg, n, data.shift); + + /* r = |x| - n*pi (range reduction into -pi/2 .. pi/2). */ + svfloat32_t r; + r = svmla_lane_f32 (ax, n, pi_vals, 0); + r = svmla_lane_f32 (r, n, pi_vals, 1); + r = svmla_lane_f32 (r, n, pi_vals, 2); + + /* sin(r) approx using a degree 9 polynomial from the Taylor series + expansion. Note that only the odd terms of this are non-zero. */ + svfloat32_t r2 = svmul_f32_x (pg, r, r); + svfloat32_t y; + y = svmla_f32_x (pg, C (2), r2, C (3)); + y = svmla_f32_x (pg, C (1), r2, y); + y = svmla_f32_x (pg, C (0), r2, y); + y = svmla_f32_x (pg, r, r, svmul_f32_x (pg, y, r2)); + + /* sign = y^sign^odd. */ + y = svreinterpret_f32_u32 (sveor_u32_x (pg, svreinterpret_u32_f32 (y), + sveor_u32_x (pg, sign, odd))); + + if (__glibc_unlikely (svptest_any (pg, cmp))) + return special_case (x, y, cmp); + return y; +} diff --git a/sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c b/sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c index cb45fd3298..4af97a25a2 100644 --- a/sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c +++ b/sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c @@ -24,3 +24,4 @@ #define VEC_TYPE float64x2_t VPCS_VECTOR_WRAPPER (cos_advsimd, _ZGVnN2v_cos) +VPCS_VECTOR_WRAPPER (sin_advsimd, _ZGVnN2v_sin) diff --git a/sysdeps/aarch64/fpu/test-double-sve-wrappers.c b/sysdeps/aarch64/fpu/test-double-sve-wrappers.c index cf72ef83b7..64c790adc5 100644 --- a/sysdeps/aarch64/fpu/test-double-sve-wrappers.c +++ b/sysdeps/aarch64/fpu/test-double-sve-wrappers.c @@ -33,3 +33,4 @@ } SVE_VECTOR_WRAPPER (cos_sve, _ZGVsMxv_cos) +SVE_VECTOR_WRAPPER (sin_sve, _ZGVsMxv_sin) diff --git a/sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c b/sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c index fa146862b0..50e776b952 100644 --- a/sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c +++ b/sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c @@ -24,3 +24,4 @@ #define VEC_TYPE float32x4_t VPCS_VECTOR_WRAPPER (cosf_advsimd, _ZGVnN4v_cosf) +VPCS_VECTOR_WRAPPER (sinf_advsimd, _ZGVnN4v_sinf) diff --git a/sysdeps/aarch64/fpu/test-float-sve-wrappers.c b/sysdeps/aarch64/fpu/test-float-sve-wrappers.c index bc26558c62..7355032929 100644 --- a/sysdeps/aarch64/fpu/test-float-sve-wrappers.c +++ b/sysdeps/aarch64/fpu/test-float-sve-wrappers.c @@ -33,3 +33,4 @@ } SVE_VECTOR_WRAPPER (cosf_sve, _ZGVsMxv_cosf) +SVE_VECTOR_WRAPPER (sinf_sve, _ZGVsMxv_sinf) diff --git a/sysdeps/aarch64/libm-test-ulps b/sysdeps/aarch64/libm-test-ulps index 07da4ab843..4145662b2d 100644 --- a/sysdeps/aarch64/libm-test-ulps +++ b/sysdeps/aarch64/libm-test-ulps @@ -1257,11 +1257,19 @@ double: 1 float: 1 ldouble: 2 +Function: "sin_advsimd": +double: 2 +float: 1 + Function: "sin_downward": double: 1 float: 1 ldouble: 3 +Function: "sin_sve": +double: 2 +float: 1 + Function: "sin_towardzero": double: 1 float: 1 diff --git a/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist b/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist index 13af421af2..a4c564859c 100644 --- a/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist +++ b/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist @@ -1,4 +1,8 @@ GLIBC_2.38 _ZGVnN2v_cos F +GLIBC_2.38 _ZGVnN2v_sin F GLIBC_2.38 _ZGVnN4v_cosf F +GLIBC_2.38 _ZGVnN4v_sinf F GLIBC_2.38 _ZGVsMxv_cos F GLIBC_2.38 _ZGVsMxv_cosf F +GLIBC_2.38 _ZGVsMxv_sin F +GLIBC_2.38 _ZGVsMxv_sinf F