From patchwork Fri Oct 25 18:21:38 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 2002463 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.a=rsa-sha256 header.s=google header.b=ybaEzumX; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces~incoming=patchwork.ozlabs.org@sourceware.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XZrpn1mzWz1xwy for ; Sat, 26 Oct 2024 05:26:56 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 90A9E385840D for ; Fri, 25 Oct 2024 18:26:54 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pl1-x630.google.com (mail-pl1-x630.google.com [IPv6:2607:f8b0:4864:20::630]) by sourceware.org (Postfix) with ESMTPS id CBD983858D21 for ; Fri, 25 Oct 2024 18:26:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CBD983858D21 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org ARC-Filter: OpenARC Filter v1.0.0 sourceware.org CBD983858D21 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::630 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729880788; cv=none; b=lBwtQUWzpyFQ8ZdEj4G86ncHS69lCLD3l05xVyi4//t+V90U3giB4qMht8XMmUDij4TMalOZe/68xOw/VsC3w9xWRAbhJM0euBBakwiUf4RCR3A3cgg5I6lQCPdjqRsd9Feq7WLLAwGEKcA3zaAh5/HnImAlJIDjW8G5cGZ+/hc= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729880788; c=relaxed/simple; bh=Jo75G4ngJ44zuB0kgCvV1w4rm+WgOzyjCrPlNmyVLXw=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=VJDgPcyzhkCpSdi0ltbjwrAPGww98mosi7h+Xd2gGm2gypayqlWVxNxBfa3xbnqoZnyQkFuEFpnAcX2TzstIclSJAHnaBtcOBTgtlP8DVks7dnYaQHFobmat8hj3LY8z+7gUCiSmDKq2H2KxScnAcC9Fs+dBOsCwI+Noyy7G+fc= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pl1-x630.google.com with SMTP id d9443c01a7336-20c6f492d2dso25013045ad.0 for ; Fri, 25 Oct 2024 11:26:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1729880781; x=1730485581; darn=sourceware.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=CwCFYj1nvoP60cuoqq+/FCM3f50xDG4qPwPVeYWWFuI=; b=ybaEzumXelze81FCAtT7AT3W5+KSKp+stejKYNl3xrptzMzQ5q2UKhXfXS72DfNUfs q387ZYr04EcZvW8NN3VTAfKnstXUuhgz+0BBe7FitepfVV1ZA9YzLgeVuyGkGTxfECMq AWHqXAkpRSarU4QTP1QwaOetbrBcsGxhPMPDocX+Db+sqPXhi/yRgBPOXAy78abQDa6A DDdRH7jJ1VFEauVNQ4mLo700wI6ErXDCGB5AG0kmiIcEyugc082J2bPaqj/9Tl+osor0 yssvlujV7+R7cIyWx1D1R9mQZvlLsjKTIRbSCxbNds/Myi5JzEBTMkw0RHkYOymPXMwB dlTw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729880781; x=1730485581; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=CwCFYj1nvoP60cuoqq+/FCM3f50xDG4qPwPVeYWWFuI=; b=gFHCQn6rglDJQi9ES9d9QPa8pwrCF/VXd+65bAiYI+hWDt+CU1LRbSx7Um9dg2RkkR +KuE1MqScAD6fSqDeRsNSz1M3TdDEIVMt5MdQVSsn/pcXsK6i52E5GjT7LMcC5I2ouZ2 wZaGw9IR4/9aYFxOnL4ftIFQ5Ewef8MRVcCeU6b7N4oTeZ9Pgpkm/RkEUWYBQCBbG9z8 QAiXwqRbc9smJrQpO8kuLXFPg1UuFZ9B7nD2kosBVpUXjDkZlrLSFc0Modhkk/u1ELBI PxNPlngOy5/1r+OAoegGvYlbdjCHqyfw9UpzapjXLpj9ExU95ly/GDXmaYqo2xqPDTTl jjrQ== X-Gm-Message-State: AOJu0YwWaWlb/RA05Kzk55D44jOXa9Ic0Lh54RtHyrB+SGPkP1TPRZdo fbakzSAEsO+8fzkBzPf2pA8v0Mbv6jg9q3rpde0z8S6Dy6G67tcLsQug3ZZvnSQfEk5RC3hGF+7 E X-Google-Smtp-Source: AGHT+IEpdfU+5wDcRhRE/2jA3sslHKJYJmZmaa6/BProqLvmkiRRNUpukirxSdsfjtXc5+UgoSBOwA== X-Received: by 2002:a17:902:f68f:b0:20e:5aaf:32c1 with SMTP id d9443c01a7336-210c6c090afmr550955ad.30.1729880781164; Fri, 25 Oct 2024 11:26:21 -0700 (PDT) Received: from mandiga.. ([2804:1b3:a7c3:a8a8:cb9e:64f4:66fb:5ca2]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-210bbf44550sm12314075ad.14.2024.10.25.11.26.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 25 Oct 2024 11:26:18 -0700 (PDT) From: Adhemerval Zanella To: libc-alpha@sourceware.org Cc: Paul Zimmermann , Alexei Sibidanov Subject: [PATCH 00/17] Add more CORE-MATH on libm Date: Fri, 25 Oct 2024 15:21:38 -0300 Message-ID: <20241025182614.2022697-1-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 X-Spam-Status: No, score=-5.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~incoming=patchwork.ozlabs.org@sourceware.org Following the tgammaf implementation (392b3f0971764) and its telling performance improvement, I worked with Pauz Zimmermann to check if we can integrate more routines on glibc. This patchset adds the optimized and correctly rounded exp10m1f, exp2m1f, expm1f, log10f, log2p1f, log1pf, and log10p1f. I also added a benchmark to evaluate each implementation. I tested the implementation on recent hardware (Ryzen 9 5900X for x86_64, Ampere/Neoverse for aarch64, and POWER10 for powerpc), and most of the implementation shows impressive performance improvements. Like the implementation from ARM optimized routines, the CORE-MATH one takes advantage of recent ISA and platform support (like fma and rounding instructions, along with FP throughpu). For a couple of implementations, exp10m1f, and exp2m1f, CORE-MATH shows slightly worse performance for x86_64-v1. It is due the glibc generic implementation that calls optimized exp10f/exp2f, and when a more recent ISA is used (x86_64-v2 or x86_64-v3) CORE-MATH shows a better output than the current implementation. For both cases I added iFUNC support to use FMA on x86_64. Adhemerval Zanella (17): math: Add e_gammaf_r to glibc code and style benchtests: Add exp10m1f benchmark benchtests: Add exp2m1f benchmark benchtests: Add expm1f benchmark benchtests: Add log10f benchmark benchtests: Add log2p1f benchmark benchtests: Add log1p benchmark benchtests: Add log10p1f benchmark math: Use exp10m1f from CORE-MATH math: Use exp2m1f from CORE-MATH math: Use expm1f from CORE-MATH math: Use log10f from CORE-MATH math: Use log2p1f from CORE-MATH math: Use log1pf from CORE-MATH math: Use log10p1f from CORE-MATH x86_64: Add exp10m1f with FMA x86_64: Add exp2m1f with FMA SHARED-FILES | 16 + benchtests/Makefile | 7 + benchtests/exp10m1f-inputs | 2389 ++++++++++++++ benchtests/exp2m1f-inputs | 2388 ++++++++++++++ benchtests/expm1f-inputs | 799 +++++ benchtests/log10f-inputs | 1005 ++++++ benchtests/log10p1f-inputs | 2888 +++++++++++++++++ benchtests/log1pf-inputs | 1005 ++++++ benchtests/log2p1f-inputs | 2888 +++++++++++++++++ sysdeps/aarch64/libm-test-ulps | 29 +- sysdeps/alpha/fpu/libm-test-ulps | 12 - sysdeps/arc/fpu/libm-test-ulps | 25 - sysdeps/arc/nofpu/libm-test-ulps | 7 - sysdeps/arm/libm-test-ulps | 31 +- sysdeps/csky/fpu/libm-test-ulps | 12 - sysdeps/csky/nofpu/libm-test-ulps | 12 - sysdeps/hppa/fpu/libm-test-ulps | 28 - sysdeps/i386/fpu/e_log10f.S | 66 - sysdeps/i386/fpu/libm-test-ulps | 25 - sysdeps/i386/fpu/s_expm1f.S | 112 - sysdeps/i386/fpu/s_log1pf.S | 66 - .../i386/i686/fpu/multiarch/libm-test-ulps | 25 - sysdeps/ieee754/flt-32/e_gammaf_r.c | 178 +- sysdeps/ieee754/flt-32/e_log10f.c | 196 +- sysdeps/ieee754/flt-32/s_exp10m1f.c | 227 ++ sysdeps/ieee754/flt-32/s_exp2m1f.c | 194 ++ sysdeps/ieee754/flt-32/s_expm1f.c | 232 +- sysdeps/ieee754/flt-32/s_log10p1f.c | 182 ++ sysdeps/ieee754/flt-32/s_log1pf.c | 271 +- sysdeps/ieee754/flt-32/s_log2p1f.c | 248 ++ .../math_errf.c => ieee754/flt-32/w_log1pf.c} | 0 sysdeps/loongarch/lp64/libm-test-ulps | 28 - sysdeps/m68k/coldfire/fpu/libm-test-ulps | 6 - sysdeps/m68k/m680x0/fpu/libm-test-ulps | 12 - sysdeps/m68k/m680x0/fpu/w_log1pf.c | 20 + sysdeps/microblaze/libm-test-ulps | 3 - sysdeps/mips/mips32/libm-test-ulps | 28 - sysdeps/mips/mips64/libm-test-ulps | 28 - sysdeps/nios2/libm-test-ulps | 3 - sysdeps/or1k/fpu/libm-test-ulps | 4 - sysdeps/or1k/nofpu/libm-test-ulps | 12 - sysdeps/powerpc/fpu/libm-test-ulps | 29 +- sysdeps/powerpc/nofpu/libm-test-ulps | 28 - sysdeps/riscv/nofpu/libm-test-ulps | 16 - sysdeps/riscv/rvd/libm-test-ulps | 28 - sysdeps/s390/fpu/libm-test-ulps | 28 - sysdeps/sh/libm-test-ulps | 6 - sysdeps/sparc/fpu/libm-test-ulps | 28 - sysdeps/x86_64/fpu/libm-test-ulps | 29 +- sysdeps/x86_64/fpu/multiarch/Makefile | 4 + sysdeps/x86_64/fpu/multiarch/s_exp10m1f-fma.c | 4 + sysdeps/x86_64/fpu/multiarch/s_exp10m1f.c | 33 + sysdeps/x86_64/fpu/multiarch/s_exp2m1f-fma.c | 4 + sysdeps/x86_64/fpu/multiarch/s_exp2m1f.c | 33 + 54 files changed, 14873 insertions(+), 1104 deletions(-) create mode 100644 benchtests/exp10m1f-inputs create mode 100644 benchtests/exp2m1f-inputs create mode 100644 benchtests/expm1f-inputs create mode 100644 benchtests/log10f-inputs create mode 100644 benchtests/log10p1f-inputs create mode 100644 benchtests/log1pf-inputs create mode 100644 benchtests/log2p1f-inputs delete mode 100644 sysdeps/i386/fpu/e_log10f.S delete mode 100644 sysdeps/i386/fpu/s_expm1f.S delete mode 100644 sysdeps/i386/fpu/s_log1pf.S create mode 100644 sysdeps/ieee754/flt-32/s_exp10m1f.c create mode 100644 sysdeps/ieee754/flt-32/s_exp2m1f.c create mode 100644 sysdeps/ieee754/flt-32/s_log10p1f.c create mode 100644 sysdeps/ieee754/flt-32/s_log2p1f.c rename sysdeps/{m68k/m680x0/fpu/math_errf.c => ieee754/flt-32/w_log1pf.c} (100%) create mode 100644 sysdeps/m68k/m680x0/fpu/w_log1pf.c create mode 100644 sysdeps/x86_64/fpu/multiarch/s_exp10m1f-fma.c create mode 100644 sysdeps/x86_64/fpu/multiarch/s_exp10m1f.c create mode 100644 sysdeps/x86_64/fpu/multiarch/s_exp2m1f-fma.c create mode 100644 sysdeps/x86_64/fpu/multiarch/s_exp2m1f.c