From patchwork Fri Mar 10 17:58:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella X-Patchwork-Id: 1755475 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256 header.s=default header.b=KsH1yrC/; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4PYDMf33h6z246J for ; Sat, 11 Mar 2023 04:59:26 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 1758E38582B0 for ; Fri, 10 Mar 2023 17:59:24 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 1758E38582B0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1678471164; bh=bj9do/PxawO0VrLmzCsXZER+q+yIf4NVatr4wNk+FjY=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=KsH1yrC/t7/g9HKw8a7kTRcHXf7DY2RjrbqkPUgQndN++kQJssRQWmu24975FOAwk GqXzZvZ15lx8YpI6jhw1tUKIpYn3OrB0uq2Ra70mfcionSL/C8Hd2Bl4MrCFNUWNLN b1OI5vpIUDt/Y81Y9wz1U+hYrL5FaJ8e+GjGuT68= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-oa1-x33.google.com (mail-oa1-x33.google.com [IPv6:2001:4860:4864:20::33]) by sourceware.org (Postfix) with ESMTPS id D42E33858039 for ; Fri, 10 Mar 2023 17:59:07 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D42E33858039 Received: by mail-oa1-x33.google.com with SMTP id 586e51a60fabf-1763e201bb4so6739481fac.1 for ; Fri, 10 Mar 2023 09:59:07 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678471146; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=bj9do/PxawO0VrLmzCsXZER+q+yIf4NVatr4wNk+FjY=; b=bQcFgdX6p2dAs5MTrU/GtzdXGg6bJhCNVoiUVYi6HsnhO/m6SaGs93vGOlItVMz1JP BnoGUJbyiOuvM9bgA2mciaq/4CnmJjf1P3auaOZF+w/DXa168xjZeleT4Mdxfyl2eKhE vkW/+I3E2ZKn3nDC+bG2OT26w9Fg3CXAD+sqBrizY+mTc6+a0e3v7RXeTAxomHF859oU t9bxunlmNBsSIzXiZhmgY5xgwQLty1Zwz1WWBAYslBtfA9ytayExM/Q05cZPPoYkSKoX kEhHhFEvuTvzFQuSch574hZW8zBAv03i5d2GsZsRk5VZt3qLqFCPOG6HUig4/djz/SQe N8fw== X-Gm-Message-State: AO0yUKUfuRIQOMioqVheufSUM//OV16PBgDcNr3cUuJltpJrxXiIh/MY NhrJrP59NyBRBX2fcJMrI8XK2k/Rf5JBY2uQHLdEGw== X-Google-Smtp-Source: AK7set9XhhoyJg25guu8IYzW4WiS1+9m0iltVGWFaYa09KujIAvIfp03m1xVORnZubbti7GUDkALyA== X-Received: by 2002:a05:6870:b251:b0:172:55c7:9b9c with SMTP id b17-20020a056870b25100b0017255c79b9cmr17766388oam.9.1678471144398; Fri, 10 Mar 2023 09:59:04 -0800 (PST) Received: from mandiga.. ([2804:1b3:a7c0:544b:5fc3:892c:864c:62f2]) by smtp.gmail.com with ESMTPSA id v11-20020a9d5a0b000000b0068bcc902b82sm288416oth.71.2023.03.10.09.59.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 10 Mar 2023 09:59:03 -0800 (PST) To: libc-alpha@sourceware.org, Wilco Dijkstra , "H . J . Lu" Subject: [PATCH 0/4] Improve fmod and fmodf Date: Fri, 10 Mar 2023 14:58:56 -0300 Message-Id: <20230310175900.2388957-1-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Reply-To: Adhemerval Zanella Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org Sender: "Libc-alpha" This is an updated version of a previous submission aimed to improve fmod implementation [1] by Kirill Okhotnikov. I extended it with: 1. Proper benchmarks for both single and double. The inputs are divided in 3 subsets: subnormals, normal nubmers, and close exponents. It uses a list with random generated values. 2. Use math_config.h definitions instead math_private (so it might eventually get back on optimize-routines). 3. Implement the same strategy for float version. 4. Also tuned the final division to use multiplication with inverse instead of direct modulo. It showed better performance on both x86_64 and aarch64 chips I have tested. The performance shows a good improvement compared to current algorithm for fmod (using gcc 11): Architecture | Input | master | patch -----------------|-----------------|----------|-------- x86_64 (Ryzen 9) | subnormals | 19.1584 | 12.0932 x86_64 (Ryzen 9) | normal | 1016.51 | 301.204 x86_64 (Ryzen 9) | close-exponents | 18.4428 | 16.8506 aarch64 (N1) | subnormal | 11.153 | 6.81778 aarch64 (N1) | normal | 528.649 | 158.339 aarch64 (N1) | close-exponents | 11.4517 | 8.67894 I also see similar improvements on arm-linux-gnueabihf when running on the N1 aarch64 chips, where it uses a lot of soft-fp implementation (for modulo, clz, ctz, and multiplication): Architecture | Input | master | patch -----------------|-----------------|----------|-------- armhf (N1) | subnormal | 15.7284 | 15.1083 armhf (N1) | normal | 837.525 | 244.833 armhf (N1) | close-exponents | 16.2111 | 21.8182 The fmodf shows a more moderate improvement: Architecture | Input | master | patch -----------------|-----------------|----------|-------- x86_64 (Ryzen 9) | subnormals | 17.2549 | 12.3214 x86_64 (Ryzen 9) | normal | 85.4096 | 52.6625 x86_64 (Ryzen 9) | close-exponents | 19.1072 | 17.4622 aarch64 (N1) | subnormal | 10.2182 | 6.81778 aarch64 (N1) | normal | 60.0616 | 158.339 aarch64 (N1) | close-exponents | 11.5256 | 8.67894 armhf (N1) | subnormal | 11.6662 | 10.8955 armhf (N1) | normal | 69.2759 | 35.4184 armhf (N1) | close-exponents | 13.6472 | 17.8539 I also checked against H.J proposal to use fprem on x86_64 [2] and against recent suggestion on libc-alpha [3], and on both cases this newer implementation shows better performance. [1] https://sourceware.org/pipermail/libc-alpha/2020-November/119794.html [2] https://patchwork.sourceware.org/project/glibc/patch/20230309183312.205763-1-hjl.tools@gmail.com/ [3] https://sourceware.org/pipermail/libc-alpha/2023-March/146164.html Adhemerval Zanella (4): benchtests: Add fmod benchmark benchtests: Add fmodf benchmark math: Improve fmod math: Improve fmodf benchtests/Makefile | 2 + benchtests/fmod-inputs | 2182 ++++++++++++++++++++++++++ benchtests/fmodf-inputs | 2182 ++++++++++++++++++++++++++ sysdeps/ieee754/dbl-64/e_fmod.c | 234 +-- sysdeps/ieee754/dbl-64/math_config.h | 110 ++ sysdeps/ieee754/flt-32/e_fmodf.c | 230 +-- sysdeps/ieee754/flt-32/math_config.h | 89 ++ 7 files changed, 4840 insertions(+), 189 deletions(-) create mode 100644 benchtests/fmod-inputs create mode 100644 benchtests/fmodf-inputs