From patchwork Thu Apr 13 14:29:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wilco Dijkstra X-Patchwork-Id: 1768506 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256 header.s=default header.b=Hw/YvyIT; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Py2703b2Gz1yZZ for ; Fri, 14 Apr 2023 00:30:36 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 538233858C78 for ; Thu, 13 Apr 2023 14:30:34 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 538233858C78 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1681396234; bh=sDyVVBU5z/ofO6PMsCoqnx9mErNXGoszG/OoPCxZvnU=; h=To:CC:Subject:Date:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=Hw/YvyITDwPOJhuho+eJ09ZxVWY0htOFO6f0/qWf8vYp5NFJqL40BXbA1EkAi4r1m UAYp0K1AAzP0jgBd2CDoqmRhzn108P6HsNK4l25KcY9ZLMs/zZuuNOJBe4Zc0qEj4n qhk93fanZtKcYq0X/gix20KBKC2//wFMohagny6c= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from EUR05-VI1-obe.outbound.protection.outlook.com (mail-vi1eur05on2057.outbound.protection.outlook.com [40.107.21.57]) by sourceware.org (Postfix) with ESMTPS id 910153858D20 for ; Thu, 13 Apr 2023 14:30:16 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 910153858D20 Received: from DB8PR06CA0038.eurprd06.prod.outlook.com (2603:10a6:10:120::12) by PAWPR08MB10118.eurprd08.prod.outlook.com (2603:10a6:102:368::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6277.41; Thu, 13 Apr 2023 14:30:07 +0000 Received: from DBAEUR03FT021.eop-EUR03.prod.protection.outlook.com (2603:10a6:10:120:cafe::28) by DB8PR06CA0038.outlook.office365.com (2603:10a6:10:120::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6298.32 via Frontend Transport; Thu, 13 Apr 2023 14:30:07 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DBAEUR03FT021.mail.protection.outlook.com (100.127.142.184) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6298.31 via Frontend Transport; Thu, 13 Apr 2023 14:30:06 +0000 Received: ("Tessian outbound 3a01b65b5aad:v136"); Thu, 13 Apr 2023 14:30:06 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 1b3e8b515e389b2b X-CR-MTA-TID: 64aa7808 Received: from 084bd5223863.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 8FE65D49-B4F9-4785-A7E9-FD2489F93DFF.1; Thu, 13 Apr 2023 14:30:00 +0000 Received: from EUR05-DB8-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 084bd5223863.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 13 Apr 2023 14:30:00 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ZgUmhTUI012iPuQV9D6IRANayhhscg1zNnEsHSsWCVdB0BcwRs5V2O8/Y8cBkcntVw4c1XxXRRltmoOQ03tYoiH2sJwZWmpRIvkdJENWwRim8SsfLblvKgJ3dIE3Fr8xXEx8fJlYgWWhWEmWQ1ryUoyhsyc+Se+K7smdbWsBdWioGiQg+uzpgjQ3uQjjfY/jUY6hg0b9BGEMsQwgBmUO7NuXGe6B1u77//iQzP4sxQxUHkkSL31N4bg5yWq9jkueDslUPc1Befej1WK4cenvWKEQvPCNQRGrNO/97H9Uyfy4vhQQahpESanEAG3iA8UKun7jli0GXl8+iLpaXoMShg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=sDyVVBU5z/ofO6PMsCoqnx9mErNXGoszG/OoPCxZvnU=; b=g880bHShtXxQDFkh4E7PIbA20pUmoqfBEw6tsYCkpeKMeWDDFB1t/1HX/8Pa/WwIinFKDzhKVH5Nk9Adj4g4pYHcC1IjkahDNVVU/uGlbETjlf0dNbrL8AlX6l3ApBiZuJQ5cOiR4XEk7b9knHEJyUI7oGWkoc2pb2XMTT4ToOnBpK5AFze5M413tnMu4O1tIckv9hk6YY0BV0NTufPknfM9uAPsQcU1fW94fskTkYKNSN3XtFl5PJ7MsNXWVDhB/PhCEchN9a5uQ8N9W0HrCCgj+CjHwm7MdOrvo4rY1o2Y2WFbaGYu5fJqNOxVNm9+caqbnW18fCC8ethQaQcL/g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Received: from PAWPR08MB8982.eurprd08.prod.outlook.com (2603:10a6:102:33f::20) by AS2PR08MB9870.eurprd08.prod.outlook.com (2603:10a6:20b:595::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6277.38; Thu, 13 Apr 2023 14:29:58 +0000 Received: from PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::13be:967d:6e80:432f]) by PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::13be:967d:6e80:432f%9]) with mapi id 15.20.6277.036; Thu, 13 Apr 2023 14:29:58 +0000 To: 'GNU C Library' CC: Adhemerval Zanella Subject: [PATCH] math: Improve fmod(f) performance Thread-Topic: [PATCH] math: Improve fmod(f) performance Thread-Index: AQHZbhJm07aDR1yyTkuekLFyXoMslw== Date: Thu, 13 Apr 2023 14:29:58 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-traffictypediagnostic: PAWPR08MB8982:EE_|AS2PR08MB9870:EE_|DBAEUR03FT021:EE_|PAWPR08MB10118:EE_ X-MS-Office365-Filtering-Correlation-Id: 8b07b0a4-61a9-4b3e-192a-08db3c2b9411 x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: 58XdCqahcVsWY8aZu8ldgArxmssREypVE9qrwbEtHfpWQyOJdq3rYnBJGTKkXlMpHLdiMYRYrXr6qz3q4K/vgUfY8t2UiNWG0DiTC1zFo0vMi3VOuS9NtJuJMqbiwWbdpFHns93QyLvsS9adR3bfaQFbDjO7qPbpZbcf3RstJvrD1e6DKzYc6hjipcVcWWxXnZKS2zUSTF6V4WPDgJSB1Gznrlr6BqdWp7AHvCxLrFCGGJsZXjq1sPxdzQVkc9UY5ZlIwsIqwM5DITMdozjr+hSHJiDeV7LDF3/XLS1DHVZHmcMrgVq/SEgke7yS/NOGARrpA0hP1kdKLtMrG5jf17xDD7yZ7Yv2eOZZCMfoYX0Zh+JYCLIKs9JLvl9HPCUYcsfG7f0IjzWd/IvoL2K9IE+Ov5NM6JxGVXSjZnAKQAjcJYoE0Q8Yx3HVFiMRk33IngdxpEar+zqKhiUAyUdPwQsCtA4QFltzVblBCt7lM3aee5XjDg4qYbqAa77houw9ziVdO9qDZeDuM3HwQC1OQv0PSn1+DvQFy3iNWHPH9nOGDg4xd2ZBcygCIFL3gQY2FaNhWS//99dJZ6PlJXhcrdHJcB6u0ubJvMrWLxxCJx5v5S2b5VggK6VuimVvjVx9 X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:PAWPR08MB8982.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230028)(4636009)(366004)(376002)(136003)(346002)(396003)(39860400002)(451199021)(71200400001)(478600001)(7696005)(9686003)(26005)(66476007)(186003)(316002)(91956017)(2906002)(76116006)(66946007)(6506007)(4326008)(66556008)(52536014)(66446008)(41300700001)(6916009)(8936002)(8676002)(64756008)(122000001)(38100700002)(55016003)(5660300002)(83380400001)(33656002)(86362001)(38070700005); DIR:OUT; SFP:1101; MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS2PR08MB9870 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DBAEUR03FT021.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: a6d2758d-b2d2-4c35-91f1-08db3c2b8f13 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: MV2JT4ARl4UzdlMVn5cLdLqijX/HRWtHtJs+aNPu72Pz5SENOwA0lhvGQKPL58/FPp3sJS7Mw7Z1G0nSeYr10umBEUdUneAaDAsG75lDM9tDWkzOqMkZzY87SXXF50GxNBs9u2dePKjMzvBnhyjSAKCYOt0REaeMxkW9Vty+ljiWHqH3gi0VvJMPoXtBHWMuRPEvzMqKLypmlKJc/whY8woPjUuQmAQVNqHzgOp0RpjHt6Iap/PX2boJq1nA8ZCctJVbI52jWC1kmkIfhg0DSo0e/2scCzWu7AN3f13JN2z7QLg4pSsZyf2YcyLQDVGxoLLtbbmlhHbWq3SG8rCLwdYQY+4EpXLEEyHrmv2eznMoDhpKhAlL3hUj6lEJNieiMx4twho5HfuhKo/WMTeLfcMvOFW1hBqtlahiLg8X+lmzS5BxCrmUxJL+4YqPha0pSLaEfJ0J5l84UnI3pCPY2KWTpPTGa0Y7r9c66zYivkO2ASseEIoy3CKx6ZLZAv6o8D/CEpuyujKBzqqPcHiCoUVgncsF7Cc54jexM3/URIwU7Gj5ywoEucnyEBMagm0SmbBk5YEw/0CAsIXAN+J1WwhRZed8kXISlbNrAOlhd+wDnQcgBFS+yDr1zWsKMmWKgc6sL7xvanPA3fxIv0S5/mCE3to5d3so6iTsrZmfHKEKlptlA1dXZtFOCeTa+BF4x5xMONcls7STQhoUKVw/tQ== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230028)(4636009)(346002)(376002)(396003)(39860400002)(136003)(451199021)(40470700004)(36840700001)(46966006)(2906002)(86362001)(82740400003)(8936002)(356005)(81166007)(52536014)(5660300002)(40460700003)(40480700001)(55016003)(33656002)(82310400005)(336012)(83380400001)(47076005)(107886003)(9686003)(6506007)(26005)(186003)(478600001)(36860700001)(316002)(41300700001)(7696005)(6916009)(8676002)(70586007)(70206006)(4326008); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 13 Apr 2023 14:30:06.8465 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 8b07b0a4-61a9-4b3e-192a-08db3c2b9411 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DBAEUR03FT021.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: PAWPR08MB10118 X-Spam-Status: No, score=-10.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Wilco Dijkstra via Libc-alpha From: Wilco Dijkstra Reply-To: Wilco Dijkstra Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org Sender: "Libc-alpha" Optimize the fast paths (x < y) and (x/y < 2^12). Delay handling of special cases to reduce amount of code executed before the fast paths. Performance of the close-exponents benchmark improves by 18-19% on Cortex-A72 and Neoverse V1. Passes regress. diff --git a/sysdeps/ieee754/dbl-64/e_fmod.c b/sysdeps/ieee754/dbl-64/e_fmod.c index caae4e47e2358daced51a22342ec6e34a04f6fce..7c0ef11cb6c53561e64fab4f2ece17692dff3e5f 100644 --- a/sysdeps/ieee754/dbl-64/e_fmod.c +++ b/sysdeps/ieee754/dbl-64/e_fmod.c @@ -40,10 +40,10 @@ r == x % y == (x % (N * y)) % y - And with mx/my being mantissa of double floating point number (which uses + And with mx/my being mantissa of a double floating point number (which uses less bits than the storage type), on each step the argument reduction can be improved by 11 (which is the size of uint64_t minus MANTISSA_WIDTH plus - the signal bit): + the implicit one bit): mx * 2^ex == 2^11 * mx * 2^(ex - 11) @@ -54,7 +54,12 @@ mx << 11; ex -= 11; mx %= my; - } */ + } + + Special cases: + - If x or y is a NaN, a NaN is returned. + - If x is an infinity, or y is zero, a NaN is returned and EDOM is set. + - If x is +0/-0, and y is not zero, +0/-0 is returned. */ double __fmod (double x, double y) @@ -67,62 +72,70 @@ __fmod (double x, double y) hx ^= sx; hy &= ~SIGN_MASK; - /* Special cases: - - If x or y is a Nan, NaN is returned. - - If x is an inifinity, a NaN is returned and EDOM is set. - - If y is zero, Nan is returned. - - If x is +0/-0, and y is not zero, +0/-0 is returned. */ - if (__glibc_unlikely (hy == 0 - || hx >= EXPONENT_MASK || hy > EXPONENT_MASK)) - { - if (is_nan (hx) || is_nan (hy)) - return (x * y) / (x * y); - return __math_edom ((x * y) / (x * y)); - } - - if (__glibc_unlikely (hx <= hy)) + /* If x < y, return x (unless y is a NaN). */ + if (__glibc_likely (hx < hy)) { - if (hx < hy) - return x; - return asdouble (sx); + /* If y is a NaN, return a NaN. */ + if (__glibc_unlikely (hy > EXPONENT_MASK)) + return x * y; + return x; } int ex = hx >> MANTISSA_WIDTH; int ey = hy >> MANTISSA_WIDTH; + int exp_diff = ex - ey; + + /* Common case where exponents are close: |x/y| < 2^12, x not inf/NaN + and |x%y| not denormal. */ + if (__glibc_likely (ey < (EXPONENT_MASK >> MANTISSA_WIDTH) - EXPONENT_WIDTH + && ey > MANTISSA_WIDTH + && exp_diff <= EXPONENT_WIDTH)) + { + uint64_t mx = (hx << EXPONENT_WIDTH) | SIGN_MASK; + uint64_t my = (hy << EXPONENT_WIDTH) | SIGN_MASK; + + mx %= (my >> exp_diff); + + if (__glibc_unlikely (mx == 0)) + return asdouble (sx); + int shift = __builtin_clzll (mx); + ex -= shift + 1; + mx <<= shift; + mx = sx | (mx >> EXPONENT_WIDTH); + return asdouble (mx + ((uint64_t)ex << MANTISSA_WIDTH)); + } - /* Common case where exponents are close: ey >= -907 and |x/y| < 2^52, */ - if (__glibc_likely (ey > MANTISSA_WIDTH && ex - ey <= EXPONENT_WIDTH)) + if (__glibc_unlikely (hy == 0 || hx >= EXPONENT_MASK)) { - uint64_t mx = (hx & MANTISSA_MASK) | (MANTISSA_MASK + 1); - uint64_t my = (hy & MANTISSA_MASK) | (MANTISSA_MASK + 1); + /* If x is a NaN, return a NaN. */ + if (hx > EXPONENT_MASK) + return x * y; - uint64_t d = (ex == ey) ? (mx - my) : (mx << (ex - ey)) % my; - return make_double (d, ey - 1, sx); + /* If x is an infinity or y is zero, return a NaN and set EDOM. */ + return __math_edom ((x * y) / (x * y)); } - /* Special case, both x and y are subnormal. */ - if (__glibc_unlikely (ex == 0 && ey == 0)) + /* Special case, both x and y are denormal. */ + if (__glibc_unlikely (ex == 0)) return asdouble (sx | hx % hy); - /* Convert |x| and |y| to 'mx + 2^ex' and 'my + 2^ey'. Assume that hx is - not subnormal by conditions above. */ + /* Extract normalized mantissas - hx is not denormal and hy != 0. */ uint64_t mx = get_mantissa (hx) | (MANTISSA_MASK + 1); - ex--; uint64_t my = get_mantissa (hy) | (MANTISSA_MASK + 1); - int lead_zeros_my = EXPONENT_WIDTH; - if (__glibc_likely (ey > 0)) - ey--; - else + + ey--; + /* Special case for denormal y. */ + if (__glibc_unlikely (ey < 0)) { my = hy; + ey = 0; + exp_diff--; lead_zeros_my = clz_uint64 (my); } - /* Assume hy != 0 */ int tail_zeros_my = ctz_uint64 (my); int sides_zeroes = lead_zeros_my + tail_zeros_my; - int exp_diff = ex - ey; int right_shift = exp_diff < tail_zeros_my ? exp_diff : tail_zeros_my; my >>= right_shift; @@ -141,8 +154,7 @@ __fmod (double x, double y) if (exp_diff == 0) return make_double (mx, ey, sx); - /* Assume modulo/divide operation is slow, so use multiplication with invert - values. */ + /* Multiplication with the inverse is faster than repeated modulo. */ uint64_t inv_hy = UINT64_MAX / my; while (exp_diff > sides_zeroes) { exp_diff -= sides_zeroes; diff --git a/sysdeps/ieee754/flt-32/e_fmodf.c b/sysdeps/ieee754/flt-32/e_fmodf.c index 763900efdaa8222431e189e50a7e62baf465a54d..14f3fcae256d03ceb4bf91898a7a003bbfcb854a 100644 --- a/sysdeps/ieee754/flt-32/e_fmodf.c +++ b/sysdeps/ieee754/flt-32/e_fmodf.c @@ -40,10 +40,10 @@ r == x % y == (x % (N * y)) % y - And with mx/my being mantissa of double floating point number (which uses + And with mx/my being mantissa of a single floating point number (which uses less bits than the storage type), on each step the argument reduction can be improved by 8 (which is the size of uint32_t minus MANTISSA_WIDTH plus - the signal bit): + the implicit one bit): mx * 2^ex == 2^8 * mx * 2^(ex - 8) @@ -54,7 +54,12 @@ mx << 8; ex -= 8; mx %= my; - } */ + } + + Special cases: + - If x or y is a NaN, a NaN is returned. + - If x is an infinity, or y is zero, a NaN is returned and EDOM is set. + - If x is +0/-0, and y is not zero, +0/-0 is returned. */ float __fmodf (float x, float y) @@ -67,61 +72,69 @@ __fmodf (float x, float y) hx ^= sx; hy &= ~SIGN_MASK; - /* Special cases: - - If x or y is a Nan, NaN is returned. - - If x is an inifinity, a NaN is returned. - - If y is zero, Nan is returned. - - If x is +0/-0, and y is not zero, +0/-0 is returned. */ - if (__glibc_unlikely (hy == 0 - || hx >= EXPONENT_MASK || hy > EXPONENT_MASK)) - { - if (is_nan (hx) || is_nan (hy)) - return (x * y) / (x * y); - return __math_edomf ((x * y) / (x * y)); - } - - if (__glibc_unlikely (hx <= hy)) + if (__glibc_likely (hx < hy)) { - if (hx < hy) - return x; - return asfloat (sx); + /* If y is a NaN, return a NaN. */ + if (__glibc_unlikely (hy > EXPONENT_MASK)) + return x * y; + return x; } int ex = hx >> MANTISSA_WIDTH; int ey = hy >> MANTISSA_WIDTH; + int exp_diff = ex - ey; + + /* Common case where exponents are close: |x/y| < 2^9, x not inf/NaN + and |x%y| not denormal. */ + if (__glibc_likely (ey < (EXPONENT_MASK >> MANTISSA_WIDTH) - EXPONENT_WIDTH + && ey > MANTISSA_WIDTH + && exp_diff <= EXPONENT_WIDTH)) + { + uint32_t mx = (hx << EXPONENT_WIDTH) | SIGN_MASK; + uint32_t my = (hy << EXPONENT_WIDTH) | SIGN_MASK; + + mx %= (my >> exp_diff); + + if (__glibc_unlikely (mx == 0)) + return asfloat (sx); + int shift = __builtin_clz (mx); + ex -= shift + 1; + mx <<= shift; + mx = sx | (mx >> EXPONENT_WIDTH); + return asfloat (mx + ((uint32_t)ex << MANTISSA_WIDTH)); + } - /* Common case where exponents are close: ey >= -103 and |x/y| < 2^8, */ - if (__glibc_likely (ey > MANTISSA_WIDTH && ex - ey <= EXPONENT_WIDTH)) + if (__glibc_unlikely (hy == 0 || hx >= EXPONENT_MASK)) { - uint64_t mx = (hx & MANTISSA_MASK) | (MANTISSA_MASK + 1); - uint64_t my = (hy & MANTISSA_MASK) | (MANTISSA_MASK + 1); + /* If x is a NaN, return a NaN. */ + if (hx > EXPONENT_MASK) + return x * y; - uint32_t d = (ex == ey) ? (mx - my) : (mx << (ex - ey)) % my; - return make_float (d, ey - 1, sx); + /* If x is an infinity or y is zero, return a NaN and set EDOM. */ + return __math_edomf ((x * y) / (x * y)); } - /* Special case, both x and y are subnormal. */ - if (__glibc_unlikely (ex == 0 && ey == 0)) + /* Special case, both x and y are denormal. */ + if (__glibc_unlikely (ex == 0)) return asfloat (sx | hx % hy); - /* Convert |x| and |y| to 'mx + 2^ex' and 'my + 2^ey'. Assume that hx is - not subnormal by conditions above. */ + /* Extract normalized mantissas - hx is not denormal and hy != 0. */ uint32_t mx = get_mantissa (hx) | (MANTISSA_MASK + 1); - ex--; - uint32_t my = get_mantissa (hy) | (MANTISSA_MASK + 1); int lead_zeros_my = EXPONENT_WIDTH; - if (__glibc_likely (ey > 0)) - ey--; - else + + ey--; + /* Special case for denormal y. */ + if (__glibc_unlikely (ey < 0)) { my = hy; + ey = 0; + exp_diff--; lead_zeros_my = __builtin_clz (my); } int tail_zeros_my = __builtin_ctz (my); int sides_zeroes = lead_zeros_my + tail_zeros_my; - int exp_diff = ex - ey; int right_shift = exp_diff < tail_zeros_my ? exp_diff : tail_zeros_my; my >>= right_shift; @@ -140,8 +153,7 @@ __fmodf (float x, float y) if (exp_diff == 0) return make_float (mx, ey, sx); - /* Assume modulo/divide operation is slow, so use multiplication with invert - values. */ + /* Multiplication with the inverse is faster than repeated modulo. */ uint32_t inv_hy = UINT32_MAX / my; while (exp_diff > sides_zeroes) { exp_diff -= sides_zeroes;