[00/10] Optimized math routines

Message ID	d4963b4e-0013-adaf-df2f-698cf421a487@arm.com
Headers	show Return-Path: <libc-alpha-return-94050-incoming=patchwork.ozlabs.org@sourceware.org> DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:cc:to:from:subject:message-id:date :mime-version:content-type:content-transfer-encoding; q=dns; s= default; b=o8c9RHowWxqMEYncwNeftdfmYlxBSNVPiCkT2SXXeiw9HWgtH5BUS UPNnokITOHlKfd09DYNPt+TffniDQvp0Njsm47HEMOuoG4IZm8fc8qKELVVceDh5 2b5eLzhuDdCZtQ7t/quFCKPnMWOM5QnI9qR/LTP4tj+QP2VS+pQykM= Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk Sender: libc-alpha-owner@sourceware.org Cc: nd@arm.com, Wilco Dijkstra <Wilco.Dijkstra@arm.com>, Joseph Myers <joseph@codesourcery.com> To: GNU C Library <libc-alpha@sourceware.org> From: Szabolcs Nagy <szabolcs.nagy@arm.com> Subject: [PATCH 00/10] Optimized math routines Message-ID: <d4963b4e-0013-adaf-df2f-698cf421a487@arm.com> Date: Fri, 6 Jul 2018 09:47:57 +0100 User-Agent: Mozilla/5.0 (X11; Linux aarch64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Received-SPF: None (protection.outlook.com: arm.com does not designate permitted sender hosts)
Series	Optimized math routines \| expand [00/10] Optimized math routines [01/10] Clean up converttoint handling and document the semantics [02/10] Improve performance of sincosf [03/10] Improve performance of sinf and cosf [04/10] Add new exp and exp2 implementations [05/10] aarch64: update libm-test-ulps [06/10] arm: update libm-test-ulps [07/10] x86_64: update libm-test-ulps [08/10] Add new log implementation [09/10] Add new log2 implementation [10/10] Add new pow implementation

Szabolcs Nagy July 6, 2018, 8:47 a.m. UTC

Optimized exp, exp2, log, log2, pow, sinf, cosf and sincosf
implementations.

I updated the patches according to comments. I included the
sincosf patches from Wilco, but split out the math_config.h
toint changes into a separate patch.

This set does not include the wrapper removal that changes ABI.
I'm still testing those and they can be applied separately.
The new exp increases the ulp error bounds in non-nearest
rounding modes, so ulp updates are needed.

Szabolcs Nagy (8):
   Clean up converttoint handling and document the semantics
   Add new exp and exp2 implementations
   aarch64: update libm-test-ulps
   arm: update libm-test-ulps
   x86_64: update libm-test-ulps
   Add new log implementation
   Add new log2 implementation
   Add new pow implementation

Wilco Dijkstra (2):
   Improve performance of sincosf
   Improve performance of sinf and cosf

  NEWS                                         |     2 +
  math/Makefile                                |     7 +-
  sysdeps/aarch64/fpu/math_private.h           |    17 +-
  sysdeps/aarch64/libm-test-ulps               |   112 +-
  sysdeps/arm/libm-test-ulps                   |    88 +-
  sysdeps/generic/math_private.h               |     1 -
  sysdeps/i386/fpu/e_exp_data.c                |     1 +
  sysdeps/i386/fpu/e_log2_data.c               |     1 +
  sysdeps/i386/fpu/e_log_data.c                |     1 +
  sysdeps/i386/fpu/e_pow_log_data.c            |     1 +
  sysdeps/i386/fpu/math_err.c                  |     1 +
  sysdeps/i386/fpu/t_exp.c                     |     1 -
  sysdeps/ia64/fpu/e_exp_data.c                |     1 +
  sysdeps/ia64/fpu/e_log2_data.c               |     1 +
  sysdeps/ia64/fpu/e_log_data.c                |     1 +
  sysdeps/ia64/fpu/e_pow_log_data.c            |     1 +
  sysdeps/ia64/fpu/math_err.c                  |     1 +
  sysdeps/ia64/fpu/s_sincosf_data.c            |     1 +
  sysdeps/ia64/fpu/t_exp.c                     |     1 -
  sysdeps/ieee754/dbl-64/Makefile              |     1 -
  sysdeps/ieee754/dbl-64/e_exp.c               |   477 +-
  sysdeps/ieee754/dbl-64/e_exp2.c              |   220 +-
  sysdeps/ieee754/dbl-64/e_exp_data.c          |   196 +
  sysdeps/ieee754/dbl-64/e_log.c               |   257 +-
  sysdeps/ieee754/dbl-64/e_log2.c              |   240 +-
  sysdeps/ieee754/dbl-64/e_log2_data.c         |   194 +
  sysdeps/ieee754/dbl-64/e_log_data.c          |   321 +
  sysdeps/ieee754/dbl-64/e_pow.c               |   656 +-
  sysdeps/ieee754/dbl-64/e_pow_log_data.c      |   173 +
  sysdeps/ieee754/dbl-64/eexp.tbl              |   172 -
  sysdeps/ieee754/dbl-64/math_config.h         |   187 +
  sysdeps/ieee754/dbl-64/math_err.c            |    92 +
  sysdeps/ieee754/dbl-64/t_exp.c               |   435 --
  sysdeps/ieee754/dbl-64/t_exp2.h              |   585 --
  sysdeps/ieee754/dbl-64/uexp.h                |    68 -
  sysdeps/ieee754/dbl-64/uexp.tbl              |  1786 -----
  sysdeps/ieee754/dbl-64/ulog.h                |    93 -
  sysdeps/ieee754/dbl-64/ulog.tbl              |  3326 --------
  sysdeps/ieee754/dbl-64/upow.h                |    76 -
  sysdeps/ieee754/dbl-64/upow.tbl              | 10188 -------------------------
  sysdeps/ieee754/dbl-64/wordsize-64/e_log2.c  |   128 -
  sysdeps/ieee754/flt-32/e_expf.c              |     5 +-
  sysdeps/ieee754/flt-32/math_config.h         |    21 +-
  sysdeps/ieee754/flt-32/s_cosf.c              |   161 +-
  sysdeps/ieee754/flt-32/s_sincosf.c           |   197 +-
  sysdeps/ieee754/flt-32/s_sincosf.h           |   258 +-
  sysdeps/ieee754/flt-32/s_sincosf_data.c      |    74 +
  sysdeps/ieee754/flt-32/s_sinf.c              |   172 +-
  sysdeps/m68k/m680x0/fpu/e_exp_data.c         |     1 +
  sysdeps/m68k/m680x0/fpu/e_log2_data.c        |     1 +
  sysdeps/m68k/m680x0/fpu/e_log_data.c         |     1 +
  sysdeps/m68k/m680x0/fpu/e_pow_log_data.c     |     1 +
  sysdeps/m68k/m680x0/fpu/math_err.c           |     1 +
  sysdeps/m68k/m680x0/fpu/s_sincosf_data.c     |     1 +
  sysdeps/m68k/m680x0/fpu/t_exp.c              |     1 -
  sysdeps/x86_64/fpu/libm-test-ulps            |   100 +-
  sysdeps/x86_64/fpu/multiarch/Makefile        |     4 +-
  sysdeps/x86_64/fpu/multiarch/s_sincosf-fma.c |    33 +-
  58 files changed, 2606 insertions(+), 18537 deletions(-)
  create mode 100644 sysdeps/i386/fpu/e_exp_data.c
  create mode 100644 sysdeps/i386/fpu/e_log2_data.c
  create mode 100644 sysdeps/i386/fpu/e_log_data.c
  create mode 100644 sysdeps/i386/fpu/e_pow_log_data.c
  create mode 100644 sysdeps/i386/fpu/math_err.c
  delete mode 100644 sysdeps/i386/fpu/t_exp.c
  create mode 100644 sysdeps/ia64/fpu/e_exp_data.c
  create mode 100644 sysdeps/ia64/fpu/e_log2_data.c
  create mode 100644 sysdeps/ia64/fpu/e_log_data.c
  create mode 100644 sysdeps/ia64/fpu/e_pow_log_data.c
  create mode 100644 sysdeps/ia64/fpu/math_err.c
  create mode 100644 sysdeps/ia64/fpu/s_sincosf_data.c
  delete mode 100644 sysdeps/ia64/fpu/t_exp.c
  create mode 100644 sysdeps/ieee754/dbl-64/e_exp_data.c
  create mode 100644 sysdeps/ieee754/dbl-64/e_log2_data.c
  create mode 100644 sysdeps/ieee754/dbl-64/e_log_data.c
  create mode 100644 sysdeps/ieee754/dbl-64/e_pow_log_data.c
  delete mode 100644 sysdeps/ieee754/dbl-64/eexp.tbl
  create mode 100644 sysdeps/ieee754/dbl-64/math_config.h
  create mode 100644 sysdeps/ieee754/dbl-64/math_err.c
  delete mode 100644 sysdeps/ieee754/dbl-64/t_exp.c
  delete mode 100644 sysdeps/ieee754/dbl-64/t_exp2.h
  delete mode 100644 sysdeps/ieee754/dbl-64/uexp.h
  delete mode 100644 sysdeps/ieee754/dbl-64/uexp.tbl
  delete mode 100644 sysdeps/ieee754/dbl-64/ulog.h
  delete mode 100644 sysdeps/ieee754/dbl-64/ulog.tbl
  delete mode 100644 sysdeps/ieee754/dbl-64/upow.h
  delete mode 100644 sysdeps/ieee754/dbl-64/upow.tbl
  delete mode 100644 sysdeps/ieee754/dbl-64/wordsize-64/e_log2.c
  create mode 100644 sysdeps/ieee754/flt-32/s_sincosf_data.c
  create mode 100644 sysdeps/m68k/m680x0/fpu/e_exp_data.c
  create mode 100644 sysdeps/m68k/m680x0/fpu/e_log2_data.c
  create mode 100644 sysdeps/m68k/m680x0/fpu/e_log_data.c
  create mode 100644 sysdeps/m68k/m680x0/fpu/e_pow_log_data.c
  create mode 100644 sysdeps/m68k/m680x0/fpu/math_err.c
  create mode 100644 sysdeps/m68k/m680x0/fpu/s_sincosf_data.c
  delete mode 100644 sysdeps/m68k/m680x0/fpu/t_exp.c

Carlos O'Donell July 6, 2018, 12:43 p.m. UTC | #1

On 07/06/2018 04:47 AM, Szabolcs Nagy wrote:
> Optimized exp, exp2, log, log2, pow, sinf, cosf and sincosf
> implementations.

Is it your intent to have these included in 2.28?

Szabolcs Nagy July 6, 2018, 3:46 p.m. UTC | #2

On 06/07/18 13:43, Carlos O'Donell wrote:
> On 07/06/2018 04:47 AM, Szabolcs Nagy wrote:
>> Optimized exp, exp2, log, log2, pow, sinf, cosf and sincosf
>> implementations.
> 
> Is it your intent to have these included in 2.28?
> 

(resending as my previous mail seems to be lost)

yes, i'd like to add it to the 'desirable in 2.28' list
if Joseph is ok with the code, but i see he is not available
right now for review.

i don't know how other maintainers feel about such change,
there needs to be an ulp update (i'm willing to do that for
targets i can access hw for testing).

Carlos O'Donell July 6, 2018, 4:27 p.m. UTC | #3

On 07/06/2018 11:46 AM, Szabolcs Nagy wrote:
> On 06/07/18 13:43, Carlos O'Donell wrote:
>> On 07/06/2018 04:47 AM, Szabolcs Nagy wrote:
>>> Optimized exp, exp2, log, log2, pow, sinf, cosf and sincosf
>>> implementations.
>>
>> Is it your intent to have these included in 2.28?
>>
> 
> (resending as my previous mail seems to be lost)
> 
> yes, i'd like to add it to the 'desirable in 2.28' list
> if Joseph is ok with the code, but i see he is not available
> right now for review.
> 
> i don't know how other maintainers feel about such change,
> there needs to be an ulp update (i'm willing to do that for
> targets i can access hw for testing).

Where there any unanswered questions in your v4 review?

Do you think v4 is basically as good as it will get?

Who were the people who signed off on the review?

Szabolcs Nagy July 6, 2018, 5:17 p.m. UTC | #4

On 06/07/18 17:27, Carlos O'Donell wrote:
> On 07/06/2018 11:46 AM, Szabolcs Nagy wrote:
>> On 06/07/18 13:43, Carlos O'Donell wrote:
>>> On 07/06/2018 04:47 AM, Szabolcs Nagy wrote:
>>>> Optimized exp, exp2, log, log2, pow, sinf, cosf and sincosf
>>>> implementations.
>>>
>>> Is it your intent to have these included in 2.28?
>>>
>>
>> (resending as my previous mail seems to be lost)
>>
>> yes, i'd like to add it to the 'desirable in 2.28' list
>> if Joseph is ok with the code, but i see he is not available
>> right now for review.
>>
>> i don't know how other maintainers feel about such change,
>> there needs to be an ulp update (i'm willing to do that for
>> targets i can access hw for testing).
> 
> Where there any unanswered questions in your v4 review?
> 
> Do you think v4 is basically as good as it will get?
> 
> Who were the people who signed off on the review?
> 

Joseph Myers started the review of both the sinf, cosf, sincosf
changes and the exp, exp2, log, log2, pow changes.

I think I addressed all of his comments in an acceptable way,
but i don't know if he had other concerns or if parts of the
code he has not reviewed yet.

Since the glibc tests pass on 3 different targets (and
build-many-glibcs.py) i think there is no danger of the
patch being completely broken.  Wilco and I tested the patches
in detail outside of glibc so it is the glibc integration where
I expected most of the issues.

I don't expect performance regression on any target, but it
was not measured e.g. on powerpc (only aarch64 and x86_64)
which might have different behaviour (previous sincosf was
optimized on that target hence it might make sense to retest
the new code to be sure).

I think the patches are in a good quality state now.
(The ABI changing part needs further work so i didn't post that.)

Szabolcs Nagy July 9, 2018, 12:15 p.m. UTC | #5

On 06/07/18 18:17, Szabolcs Nagy wrote:
> On 06/07/18 17:27, Carlos O'Donell wrote:
>> On 07/06/2018 11:46 AM, Szabolcs Nagy wrote:
>>> On 06/07/18 13:43, Carlos O'Donell wrote:
>>>> On 07/06/2018 04:47 AM, Szabolcs Nagy wrote:
>>>>> Optimized exp, exp2, log, log2, pow, sinf, cosf and sincosf
>>>>> implementations.
>>>>
>>>> Is it your intent to have these included in 2.28?
>>>>
>>>
>>> (resending as my previous mail seems to be lost)
>>>
>>> yes, i'd like to add it to the 'desirable in 2.28' list
>>> if Joseph is ok with the code, but i see he is not available
>>> right now for review.
>>>
>>> i don't know how other maintainers feel about such change,
>>> there needs to be an ulp update (i'm willing to do that for
>>> targets i can access hw for testing).
>>
>> Where there any unanswered questions in your v4 review?
>>
>> Do you think v4 is basically as good as it will get?
>>
>> Who were the people who signed off on the review?
>>
> 
> Joseph Myers started the review of both the sinf, cosf, sincosf
> changes and the exp, exp2, log, log2, pow changes.
> 
> I think I addressed all of his comments in an acceptable way,
> but i don't know if he had other concerns or if parts of the
> code he has not reviewed yet.
> 
> Since the glibc tests pass on 3 different targets (and
> build-many-glibcs.py) i think there is no danger of the
> patch being completely broken.  Wilco and I tested the patches
> in detail outside of glibc so it is the glibc integration where
> I expected most of the issues.
> 
> I don't expect performance regression on any target, but it
> was not measured e.g. on powerpc (only aarch64 and x86_64)
> which might have different behaviour (previous sincosf was
> optimized on that target hence it might make sense to retest
> the new code to be sure).
> 
> I think the patches are in a good quality state now.
> (The ABI changing part needs further work so i didn't post that.)

built and tested on a power8 machine now, glibc math
tests pass (except for an unrelated fmal failure),
benchmark improvements are consistent with aarch64/x86_64,
but it was a shared access machine so i won't post exact
numbers, sincosf improved a bit too, sinf/cosf didn't
(apparently powerpc has its own implementation).

Adhemerval Zanella Netto July 9, 2018, 1:09 p.m. UTC | #6

On 09/07/2018 09:15, Szabolcs Nagy wrote:
> On 06/07/18 18:17, Szabolcs Nagy wrote:
>> On 06/07/18 17:27, Carlos O'Donell wrote:
>>> On 07/06/2018 11:46 AM, Szabolcs Nagy wrote:
>>>> On 06/07/18 13:43, Carlos O'Donell wrote:
>>>>> On 07/06/2018 04:47 AM, Szabolcs Nagy wrote:
>>>>>> Optimized exp, exp2, log, log2, pow, sinf, cosf and sincosf
>>>>>> implementations.
>>>>>
>>>>> Is it your intent to have these included in 2.28?
>>>>>
>>>>
>>>> (resending as my previous mail seems to be lost)
>>>>
>>>> yes, i'd like to add it to the 'desirable in 2.28' list
>>>> if Joseph is ok with the code, but i see he is not available
>>>> right now for review.
>>>>
>>>> i don't know how other maintainers feel about such change,
>>>> there needs to be an ulp update (i'm willing to do that for
>>>> targets i can access hw for testing).
>>>
>>> Where there any unanswered questions in your v4 review?
>>>
>>> Do you think v4 is basically as good as it will get?
>>>
>>> Who were the people who signed off on the review?
>>>
>>
>> Joseph Myers started the review of both the sinf, cosf, sincosf
>> changes and the exp, exp2, log, log2, pow changes.
>>
>> I think I addressed all of his comments in an acceptable way,
>> but i don't know if he had other concerns or if parts of the
>> code he has not reviewed yet.
>>
>> Since the glibc tests pass on 3 different targets (and
>> build-many-glibcs.py) i think there is no danger of the
>> patch being completely broken.  Wilco and I tested the patches
>> in detail outside of glibc so it is the glibc integration where
>> I expected most of the issues.
>>
>> I don't expect performance regression on any target, but it
>> was not measured e.g. on powerpc (only aarch64 and x86_64)
>> which might have different behaviour (previous sincosf was
>> optimized on that target hence it might make sense to retest
>> the new code to be sure).
>>
>> I think the patches are in a good quality state now.
>> (The ABI changing part needs further work so i didn't post that.)
> 
> built and tested on a power8 machine now, glibc math
> tests pass (except for an unrelated fmal failure),
> benchmark improvements are consistent with aarch64/x86_64,
> but it was a shared access machine so i won't post exact
> numbers, sincosf improved a bit too, sinf/cosf didn't
> (apparently powerpc has its own implementation).

PowerPC sinf/cosf uses the same algorithm used on x86, I presume
it would be a gain to generic implementation as well.

Szabolcs Nagy July 9, 2018, 1:34 p.m. UTC | #7

On 09/07/18 14:09, Adhemerval Zanella wrote:
> On 09/07/2018 09:15, Szabolcs Nagy wrote:
>> built and tested on a power8 machine now, glibc math
>> tests pass (except for an unrelated fmal failure),
>> benchmark improvements are consistent with aarch64/x86_64,
>> but it was a shared access machine so i won't post exact
>> numbers, sincosf improved a bit too, sinf/cosf didn't
>> (apparently powerpc has its own implementation).
> 
> PowerPC sinf/cosf uses the same algorithm used on x86, I presume
> it would be a gain to generic implementation as well.
> 

you mean the new implementation would be better or the
target specific one?

new implementation has better latency on this particular
powerpc machine than the target specific code, but
throughput is worse sometimes (using the default 0
setting for PREFER_FLOAT_COMPARISON).

Adhemerval Zanella Netto July 9, 2018, 2:26 p.m. UTC | #8

On 09/07/2018 10:34, Szabolcs Nagy wrote:
> On 09/07/18 14:09, Adhemerval Zanella wrote:
>> On 09/07/2018 09:15, Szabolcs Nagy wrote:
>>> built and tested on a power8 machine now, glibc math
>>> tests pass (except for an unrelated fmal failure),
>>> benchmark improvements are consistent with aarch64/x86_64,
>>> but it was a shared access machine so i won't post exact
>>> numbers, sincosf improved a bit too, sinf/cosf didn't
>>> (apparently powerpc has its own implementation).
>>
>> PowerPC sinf/cosf uses the same algorithm used on x86, I presume
>> it would be a gain to generic implementation as well.
>>
> 
> you mean the new implementation would be better or the
> target specific one?
> 
> new implementation has better latency on this particular
> powerpc machine than the target specific code, but
> throughput is worse sometimes (using the default 0
> setting for PREFER_FLOAT_COMPARISON).

I did not measure, but I would expect.  PowerPC uses an different
implementation for generic code (s_sinf-ppc64.c) so comparing against
it maybe misleading (since it use the old implementation still).

I am not sure which compiler you used for evaluation, but at least
Ubuntu 16.04 one (gcc 5.4) does not use POWER8 ISA as default and
even with -mcpu=power8 it generates subpar code.  I will try to
check with a GCC 7.1 (but as for your environment, I am using
a shared machine, although it I think I might get slight better
results because it uses a micro-partition).

For PREFER_FLOAT_COMPARISON, do we use this on glibc? I think
it is only enabled on optimized-routines, isn't it?

Szabolcs Nagy July 9, 2018, 3:41 p.m. UTC | #9

On 09/07/18 15:26, Adhemerval Zanella wrote:
> On 09/07/2018 10:34, Szabolcs Nagy wrote:
>> On 09/07/18 14:09, Adhemerval Zanella wrote:
>>> On 09/07/2018 09:15, Szabolcs Nagy wrote:
>>>> built and tested on a power8 machine now, glibc math
>>>> tests pass (except for an unrelated fmal failure),
>>>> benchmark improvements are consistent with aarch64/x86_64,
>>>> but it was a shared access machine so i won't post exact
>>>> numbers, sincosf improved a bit too, sinf/cosf didn't
>>>> (apparently powerpc has its own implementation).
>>>
>>> PowerPC sinf/cosf uses the same algorithm used on x86, I presume
>>> it would be a gain to generic implementation as well.
>>>
>>
>> you mean the new implementation would be better or the
>> target specific one?
>>
>> new implementation has better latency on this particular
>> powerpc machine than the target specific code, but
>> throughput is worse sometimes (using the default 0
>> setting for PREFER_FLOAT_COMPARISON).
> 
> I did not measure, but I would expect.  PowerPC uses an different
> implementation for generic code (s_sinf-ppc64.c) so comparing against
> it maybe misleading (since it use the old implementation still).
> 

i'm comparing two glibc builds, they both still use the
same (old) code for sinf/cosf so there is nothing misleading.

the sincosf code is generic though and the new implementation
does show some speedup.

> I am not sure which compiler you used for evaluation, but at least
> Ubuntu 16.04 one (gcc 5.4) does not use POWER8 ISA as default and
> even with -mcpu=power8 it generates subpar code.  I will try to
> check with a GCC 7.1 (but as for your environment, I am using
> a shared machine, although it I think I might get slight better
> results because it uses a micro-partition).
> 

i built a gcc 7.3.0 toolchain, the host toolchain would
not be able to build glibc (gcc-4.8), same for the host
make (3.82). (it's a gcc build farm machine)

> For PREFER_FLOAT_COMPARISON, do we use this on glibc? I think
> it is only enabled on optimized-routines, isn't it?

it is disabled by default, it is there so targets can enable
it if float compares are faster than using the representation,
currently disabled everywhere in glibc.
(i don't want to change that setting now, that case can
be tweaked later)

Adhemerval Zanella Netto July 9, 2018, 6:19 p.m. UTC | #10

On 09/07/2018 12:41, Szabolcs Nagy wrote:
> On 09/07/18 15:26, Adhemerval Zanella wrote:
>> On 09/07/2018 10:34, Szabolcs Nagy wrote:
>>> On 09/07/18 14:09, Adhemerval Zanella wrote:
>>>> On 09/07/2018 09:15, Szabolcs Nagy wrote:
>>>>> built and tested on a power8 machine now, glibc math
>>>>> tests pass (except for an unrelated fmal failure),
>>>>> benchmark improvements are consistent with aarch64/x86_64,
>>>>> but it was a shared access machine so i won't post exact
>>>>> numbers, sincosf improved a bit too, sinf/cosf didn't
>>>>> (apparently powerpc has its own implementation).
>>>>
>>>> PowerPC sinf/cosf uses the same algorithm used on x86, I presume
>>>> it would be a gain to generic implementation as well.
>>>>
>>>
>>> you mean the new implementation would be better or the
>>> target specific one?
>>>
>>> new implementation has better latency on this particular
>>> powerpc machine than the target specific code, but
>>> throughput is worse sometimes (using the default 0
>>> setting for PREFER_FLOAT_COMPARISON).
>>
>> I did not measure, but I would expect.  PowerPC uses an different
>> implementation for generic code (s_sinf-ppc64.c) so comparing against
>> it maybe misleading (since it use the old implementation still).
>>
> 
> i'm comparing two glibc builds, they both still use the
> same (old) code for sinf/cosf so there is nothing misleading.

I meant comparing the generic s_sinf against powerpc's one (since
default ifunc selection for powerpc is not the generic is not
sysdeps/ieee754/flt-32/s_sinf.c). But indeed this is not for
this case, sorry for the noise.

> 
> the sincosf code is generic though and the new implementation
> does show some speedup.
> 
>> I am not sure which compiler you used for evaluation, but at least
>> Ubuntu 16.04 one (gcc 5.4) does not use POWER8 ISA as default and
>> even with -mcpu=power8 it generates subpar code.  I will try to
>> check with a GCC 7.1 (but as for your environment, I am using
>> a shared machine, although it I think I might get slight better
>> results because it uses a micro-partition).
>>
> 
> i built a gcc 7.3.0 toolchain, the host toolchain would
> not be able to build glibc (gcc-4.8), same for the host
> make (3.82). (it's a gcc build farm machine)

Using glibc benchmarks, I also see better results with power8 
implementation:

s_sinf-power8:

  "sinf": {
   "": {
    "duration": 5.12725e+09,
    "iterations": 7.03494e+08,
    "max": 983.08,
    "min": 6.06,
    "mean": 7.28827
   }
  }


s_sinf-ppc64:

  "sinf": {
   "": {
    "duration": 5.13064e+09,
    "iterations": 1.86048e+08,
    "max": 1032.52,
    "min": 8.035,
    "mean": 27.577
   }
  }

generic s_sinf:

  "sinf": {
   "": {
    "duration": 5.12404e+09,
    "iterations": 6.74424e+08,
    "max": 515.97,
    "min": 6.089,
    "mean": 7.59765
   }
  }

One remark is I think we can get rid of generic powerpc sinf
(sysdeps/powerpc/fpu/s_sinf.c) and use generic implementation
instead.

> 
>> For PREFER_FLOAT_COMPARISON, do we use this on glibc? I think
>> it is only enabled on optimized-routines, isn't it?
> 
> it is disabled by default, it is there so targets can enable
> it if float compares are faster than using the representation,
> currently disabled everywhere in glibc.
> (i don't want to change that setting now, that case can
> be tweaked later)

Szabolcs Nagy July 11, 2018, 1:39 p.m. UTC | #11

On 06/07/18 17:27, Carlos O'Donell wrote:
> On 07/06/2018 11:46 AM, Szabolcs Nagy wrote:
>> On 06/07/18 13:43, Carlos O'Donell wrote:
>>> On 07/06/2018 04:47 AM, Szabolcs Nagy wrote:
>>>> Optimized exp, exp2, log, log2, pow, sinf, cosf and sincosf
>>>> implementations.
>>>
>>> Is it your intent to have these included in 2.28?
>>>
>>
>> (resending as my previous mail seems to be lost)
>>
>> yes, i'd like to add it to the 'desirable in 2.28' list
>> if Joseph is ok with the code, but i see he is not available
>> right now for review.
>>
>> i don't know how other maintainers feel about such change,
>> there needs to be an ulp update (i'm willing to do that for
>> targets i can access hw for testing).
> 
> Where there any unanswered questions in your v4 review?
> 
> Do you think v4 is basically as good as it will get?
> 
> Who were the people who signed off on the review?
> 

i made two minor modifications (codegen is not affected):
- removed an unused configuration option from sincosf.
- fixed a comment in pow.

i ran glibc tests on powerpc64le-linux-gnu and i686-linux-gnu
too now.

i will also update the improvement numbers in the commit messages
to reflect dynamic linked master vs new instead of static linked
master vs nowrapper improvements (i didn't repost the patchset
just to change those numbers though).

i propose this change to be included in glibc 2.28 (added it to
the desired features), i'm willing to do further tweaks if
necessary.

i believe the code provides the documented math quality
guarantees and does not introduce regressions on any target
(other than ulp bound update is needed after the exp patch).

the wrapper removal is deferred to glibc 2.29 (it requires
some changes to the libm alias machinery, it changes abi
and target maintainers may need to tweak the ifunc mechanism
where appropriate for optimal performance)

Joseph Myers July 17, 2018, 9:59 p.m. UTC | #12

On Fri, 6 Jul 2018, Szabolcs Nagy wrote:

> I think I addressed all of his comments in an acceptable way,
> but i don't know if he had other concerns or if parts of the
> code he has not reviewed yet.

The missing information in comments in previous versions effectively 
prevented review of substantial parts of the code in those versions; 
comments are critical information about the interfaces between different 
parts of the code that are required for effective review (as well as for 
subsequent maintainability of the code).

I think these changes are far too high risk for this late in the freeze.

Carlos O'Donell July 18, 2018, 12:07 a.m. UTC | #13

On 07/17/2018 05:59 PM, Joseph Myers wrote:
> On Fri, 6 Jul 2018, Szabolcs Nagy wrote:
> 
>> I think I addressed all of his comments in an acceptable way,
>> but i don't know if he had other concerns or if parts of the
>> code he has not reviewed yet.
> 
> The missing information in comments in previous versions effectively 
> prevented review of substantial parts of the code in those versions; 
> comments are critical information about the interfaces between different 
> parts of the code that are required for effective review (as well as for 
> subsequent maintainability of the code).
> 
> I think these changes are far too high risk for this late in the freeze.
 
Thanks for that review.

In which case let us defer to 2.29.

Szabolcs Nagy Aug. 7, 2018, 11:03 a.m. UTC | #14

On 06/07/18 09:47, Szabolcs Nagy wrote:
> Optimized exp, exp2, log, log2, pow, sinf, cosf and sincosf
> implementations.
> 
> I updated the patches according to comments. I included the
> sincosf patches from Wilco, but split out the math_config.h
> toint changes into a separate patch.
> 
> This set does not include the wrapper removal that changes ABI.
> I'm still testing those and they can be applied separately.
> The new exp increases the ulp error bounds in non-nearest
> rounding modes, so ulp updates are needed.
> 
> Szabolcs Nagy (8):
>    Clean up converttoint handling and document the semantics
>    Add new exp and exp2 implementations
>    aarch64: update libm-test-ulps
>    arm: update libm-test-ulps
>    x86_64: update libm-test-ulps
>    Add new log implementation
>    Add new log2 implementation
>    Add new pow implementation
> 
> Wilco Dijkstra (2):
>    Improve performance of sincosf
>    Improve performance of sinf and cosf
> 

ping.

[00/10] Optimized math routines

Message

Comments