From patchwork Fri Mar 10 17:58:56 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Adhemerval Zanella <adhemerval.zanella@linaro.org>
X-Patchwork-Id: 1755475
Return-Path: <libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@legolas.ozlabs.org
Authentication-Results: legolas.ozlabs.org;
 spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org
 (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org;
 envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org;
 receiver=<UNKNOWN>)
Authentication-Results: legolas.ozlabs.org;
	dkim=pass (1024-bit key;
 secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256
 header.s=default header.b=KsH1yrC/;
	dkim-atps=neutral
Received: from sourceware.org (server2.sourceware.org
 [IPv6:2620:52:3:1:0:246e:9693:128c])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384)
	(No client certificate requested)
	by legolas.ozlabs.org (Postfix) with ESMTPS id 4PYDMf33h6z246J
	for <incoming@patchwork.ozlabs.org>; Sat, 11 Mar 2023 04:59:26 +1100 (AEDT)
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 1758E38582B0
	for <incoming@patchwork.ozlabs.org>; Fri, 10 Mar 2023 17:59:24 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 1758E38582B0
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org;
	s=default; t=1678471164;
	bh=bj9do/PxawO0VrLmzCsXZER+q+yIf4NVatr4wNk+FjY=;
	h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post:
	 List-Help:List-Subscribe:From:Reply-To:From;
	b=KsH1yrC/t7/g9HKw8a7kTRcHXf7DY2RjrbqkPUgQndN++kQJssRQWmu24975FOAwk
	 GqXzZvZ15lx8YpI6jhw1tUKIpYn3OrB0uq2Ra70mfcionSL/C8Hd2Bl4MrCFNUWNLN
	 b1OI5vpIUDt/Y81Y9wz1U+hYrL5FaJ8e+GjGuT68=
X-Original-To: libc-alpha@sourceware.org
Delivered-To: libc-alpha@sourceware.org
Received: from mail-oa1-x33.google.com (mail-oa1-x33.google.com
 [IPv6:2001:4860:4864:20::33])
 by sourceware.org (Postfix) with ESMTPS id D42E33858039
 for <libc-alpha@sourceware.org>; Fri, 10 Mar 2023 17:59:07 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D42E33858039
Received: by mail-oa1-x33.google.com with SMTP id
 586e51a60fabf-1763e201bb4so6739481fac.1
 for <libc-alpha@sourceware.org>; Fri, 10 Mar 2023 09:59:07 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112; t=1678471146;
 h=content-transfer-encoding:mime-version:message-id:date:subject:to
 :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
 bh=bj9do/PxawO0VrLmzCsXZER+q+yIf4NVatr4wNk+FjY=;
 b=bQcFgdX6p2dAs5MTrU/GtzdXGg6bJhCNVoiUVYi6HsnhO/m6SaGs93vGOlItVMz1JP
 BnoGUJbyiOuvM9bgA2mciaq/4CnmJjf1P3auaOZF+w/DXa168xjZeleT4Mdxfyl2eKhE
 vkW/+I3E2ZKn3nDC+bG2OT26w9Fg3CXAD+sqBrizY+mTc6+a0e3v7RXeTAxomHF859oU
 t9bxunlmNBsSIzXiZhmgY5xgwQLty1Zwz1WWBAYslBtfA9ytayExM/Q05cZPPoYkSKoX
 kEhHhFEvuTvzFQuSch574hZW8zBAv03i5d2GsZsRk5VZt3qLqFCPOG6HUig4/djz/SQe
 N8fw==
X-Gm-Message-State: AO0yUKUfuRIQOMioqVheufSUM//OV16PBgDcNr3cUuJltpJrxXiIh/MY
 NhrJrP59NyBRBX2fcJMrI8XK2k/Rf5JBY2uQHLdEGw==
X-Google-Smtp-Source: 
 AK7set9XhhoyJg25guu8IYzW4WiS1+9m0iltVGWFaYa09KujIAvIfp03m1xVORnZubbti7GUDkALyA==
X-Received: by 2002:a05:6870:b251:b0:172:55c7:9b9c with SMTP id
 b17-20020a056870b25100b0017255c79b9cmr17766388oam.9.1678471144398;
 Fri, 10 Mar 2023 09:59:04 -0800 (PST)
Received: from mandiga.. ([2804:1b3:a7c0:544b:5fc3:892c:864c:62f2])
 by smtp.gmail.com with ESMTPSA id
 v11-20020a9d5a0b000000b0068bcc902b82sm288416oth.71.2023.03.10.09.59.02
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Fri, 10 Mar 2023 09:59:03 -0800 (PST)
To: libc-alpha@sourceware.org, Wilco Dijkstra <Wilco.Dijkstra@arm.com>,
 "H . J . Lu" <hjl.tools@gmail.com>
Subject: [PATCH 0/4] Improve fmod and fmodf
Date: Fri, 10 Mar 2023 14:58:56 -0300
Message-Id: <20230310175900.2388957-1-adhemerval.zanella@linaro.org>
X-Mailer: git-send-email 2.34.1
MIME-Version: 1.0
X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,
 SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: libc-alpha@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=subscribe>
X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha
 <libc-alpha@sourceware.org>
From: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reply-To: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org
Sender: "Libc-alpha"
 <libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org>

This is an updated version of a previous submission aimed to improve
fmod implementation [1] by Kirill Okhotnikov.  I extended it with:

  1. Proper benchmarks for both single and double.  The inputs are
     divided in 3 subsets: subnormals, normal nubmers, and close 
     exponents.  It uses a list with random generated values.

  2. Use math_config.h definitions instead math_private (so it might
     eventually get back on optimize-routines).

  3. Implement the same strategy for float version.

  4. Also tuned the final division to use multiplication with inverse
     instead of direct modulo.  It showed better performance on both
     x86_64 and aarch64 chips I have tested.

The performance shows a good improvement compared to current algorithm
for fmod (using gcc 11):

  Architecture     | Input           | master   | patch
  -----------------|-----------------|----------|--------
  x86_64 (Ryzen 9) | subnormals      | 19.1584  | 12.0932
  x86_64 (Ryzen 9) | normal          | 1016.51  | 301.204
  x86_64 (Ryzen 9) | close-exponents | 18.4428  | 16.8506
  aarch64 (N1)     | subnormal       | 11.153   | 6.81778
  aarch64 (N1)     | normal          | 528.649  | 158.339
  aarch64 (N1)     | close-exponents | 11.4517  | 8.67894

I also see similar improvements on arm-linux-gnueabihf when running on
the N1 aarch64 chips, where it uses a lot of soft-fp implementation
(for modulo, clz, ctz, and multiplication):

  Architecture     | Input           | master   | patch
  -----------------|-----------------|----------|--------
  armhf (N1)       | subnormal       | 15.7284  | 15.1083
  armhf (N1)       | normal          | 837.525  | 244.833
  armhf (N1)       | close-exponents | 16.2111  | 21.8182


The fmodf shows a more moderate improvement:

  Architecture     | Input           | master   | patch
  -----------------|-----------------|----------|--------
  x86_64 (Ryzen 9) | subnormals      | 17.2549  | 12.3214
  x86_64 (Ryzen 9) | normal          | 85.4096  | 52.6625
  x86_64 (Ryzen 9) | close-exponents | 19.1072  | 17.4622
  aarch64 (N1)     | subnormal       | 10.2182  | 6.81778
  aarch64 (N1)     | normal          | 60.0616  | 158.339
  aarch64 (N1)     | close-exponents | 11.5256  | 8.67894
  armhf (N1)       | subnormal       | 11.6662  | 10.8955
  armhf (N1)       | normal          | 69.2759  | 35.4184
  armhf (N1)       | close-exponents | 13.6472  | 17.8539


I also checked against H.J proposal to use fprem on x86_64 [2] and
against recent suggestion on libc-alpha [3], and on both cases 
this newer implementation shows better performance.

[1] https://sourceware.org/pipermail/libc-alpha/2020-November/119794.html
[2] https://patchwork.sourceware.org/project/glibc/patch/20230309183312.205763-1-hjl.tools@gmail.com/
[3] https://sourceware.org/pipermail/libc-alpha/2023-March/146164.html

Adhemerval Zanella (4):
  benchtests: Add fmod benchmark
  benchtests: Add fmodf benchmark
  math: Improve fmod
  math: Improve fmodf

 benchtests/Makefile                  |    2 +
 benchtests/fmod-inputs               | 2182 ++++++++++++++++++++++++++
 benchtests/fmodf-inputs              | 2182 ++++++++++++++++++++++++++
 sysdeps/ieee754/dbl-64/e_fmod.c      |  234 +--
 sysdeps/ieee754/dbl-64/math_config.h |  110 ++
 sysdeps/ieee754/flt-32/e_fmodf.c     |  230 +--
 sysdeps/ieee754/flt-32/math_config.h |   89 ++
 7 files changed, 4840 insertions(+), 189 deletions(-)
 create mode 100644 benchtests/fmod-inputs
 create mode 100644 benchtests/fmodf-inputs