Message ID | d4963b4e-0013-adaf-df2f-698cf421a487@arm.com |
---|---|
Headers | show |
Series | Optimized math routines | expand |
On 07/06/2018 04:47 AM, Szabolcs Nagy wrote: > Optimized exp, exp2, log, log2, pow, sinf, cosf and sincosf > implementations. Is it your intent to have these included in 2.28?
On 06/07/18 13:43, Carlos O'Donell wrote: > On 07/06/2018 04:47 AM, Szabolcs Nagy wrote: >> Optimized exp, exp2, log, log2, pow, sinf, cosf and sincosf >> implementations. > > Is it your intent to have these included in 2.28? > (resending as my previous mail seems to be lost) yes, i'd like to add it to the 'desirable in 2.28' list if Joseph is ok with the code, but i see he is not available right now for review. i don't know how other maintainers feel about such change, there needs to be an ulp update (i'm willing to do that for targets i can access hw for testing).
On 07/06/2018 11:46 AM, Szabolcs Nagy wrote: > On 06/07/18 13:43, Carlos O'Donell wrote: >> On 07/06/2018 04:47 AM, Szabolcs Nagy wrote: >>> Optimized exp, exp2, log, log2, pow, sinf, cosf and sincosf >>> implementations. >> >> Is it your intent to have these included in 2.28? >> > > (resending as my previous mail seems to be lost) > > yes, i'd like to add it to the 'desirable in 2.28' list > if Joseph is ok with the code, but i see he is not available > right now for review. > > i don't know how other maintainers feel about such change, > there needs to be an ulp update (i'm willing to do that for > targets i can access hw for testing). Where there any unanswered questions in your v4 review? Do you think v4 is basically as good as it will get? Who were the people who signed off on the review?
On 06/07/18 17:27, Carlos O'Donell wrote: > On 07/06/2018 11:46 AM, Szabolcs Nagy wrote: >> On 06/07/18 13:43, Carlos O'Donell wrote: >>> On 07/06/2018 04:47 AM, Szabolcs Nagy wrote: >>>> Optimized exp, exp2, log, log2, pow, sinf, cosf and sincosf >>>> implementations. >>> >>> Is it your intent to have these included in 2.28? >>> >> >> (resending as my previous mail seems to be lost) >> >> yes, i'd like to add it to the 'desirable in 2.28' list >> if Joseph is ok with the code, but i see he is not available >> right now for review. >> >> i don't know how other maintainers feel about such change, >> there needs to be an ulp update (i'm willing to do that for >> targets i can access hw for testing). > > Where there any unanswered questions in your v4 review? > > Do you think v4 is basically as good as it will get? > > Who were the people who signed off on the review? > Joseph Myers started the review of both the sinf, cosf, sincosf changes and the exp, exp2, log, log2, pow changes. I think I addressed all of his comments in an acceptable way, but i don't know if he had other concerns or if parts of the code he has not reviewed yet. Since the glibc tests pass on 3 different targets (and build-many-glibcs.py) i think there is no danger of the patch being completely broken. Wilco and I tested the patches in detail outside of glibc so it is the glibc integration where I expected most of the issues. I don't expect performance regression on any target, but it was not measured e.g. on powerpc (only aarch64 and x86_64) which might have different behaviour (previous sincosf was optimized on that target hence it might make sense to retest the new code to be sure). I think the patches are in a good quality state now. (The ABI changing part needs further work so i didn't post that.)
On 06/07/18 18:17, Szabolcs Nagy wrote: > On 06/07/18 17:27, Carlos O'Donell wrote: >> On 07/06/2018 11:46 AM, Szabolcs Nagy wrote: >>> On 06/07/18 13:43, Carlos O'Donell wrote: >>>> On 07/06/2018 04:47 AM, Szabolcs Nagy wrote: >>>>> Optimized exp, exp2, log, log2, pow, sinf, cosf and sincosf >>>>> implementations. >>>> >>>> Is it your intent to have these included in 2.28? >>>> >>> >>> (resending as my previous mail seems to be lost) >>> >>> yes, i'd like to add it to the 'desirable in 2.28' list >>> if Joseph is ok with the code, but i see he is not available >>> right now for review. >>> >>> i don't know how other maintainers feel about such change, >>> there needs to be an ulp update (i'm willing to do that for >>> targets i can access hw for testing). >> >> Where there any unanswered questions in your v4 review? >> >> Do you think v4 is basically as good as it will get? >> >> Who were the people who signed off on the review? >> > > Joseph Myers started the review of both the sinf, cosf, sincosf > changes and the exp, exp2, log, log2, pow changes. > > I think I addressed all of his comments in an acceptable way, > but i don't know if he had other concerns or if parts of the > code he has not reviewed yet. > > Since the glibc tests pass on 3 different targets (and > build-many-glibcs.py) i think there is no danger of the > patch being completely broken. Wilco and I tested the patches > in detail outside of glibc so it is the glibc integration where > I expected most of the issues. > > I don't expect performance regression on any target, but it > was not measured e.g. on powerpc (only aarch64 and x86_64) > which might have different behaviour (previous sincosf was > optimized on that target hence it might make sense to retest > the new code to be sure). > > I think the patches are in a good quality state now. > (The ABI changing part needs further work so i didn't post that.) built and tested on a power8 machine now, glibc math tests pass (except for an unrelated fmal failure), benchmark improvements are consistent with aarch64/x86_64, but it was a shared access machine so i won't post exact numbers, sincosf improved a bit too, sinf/cosf didn't (apparently powerpc has its own implementation).
On 09/07/2018 09:15, Szabolcs Nagy wrote: > On 06/07/18 18:17, Szabolcs Nagy wrote: >> On 06/07/18 17:27, Carlos O'Donell wrote: >>> On 07/06/2018 11:46 AM, Szabolcs Nagy wrote: >>>> On 06/07/18 13:43, Carlos O'Donell wrote: >>>>> On 07/06/2018 04:47 AM, Szabolcs Nagy wrote: >>>>>> Optimized exp, exp2, log, log2, pow, sinf, cosf and sincosf >>>>>> implementations. >>>>> >>>>> Is it your intent to have these included in 2.28? >>>>> >>>> >>>> (resending as my previous mail seems to be lost) >>>> >>>> yes, i'd like to add it to the 'desirable in 2.28' list >>>> if Joseph is ok with the code, but i see he is not available >>>> right now for review. >>>> >>>> i don't know how other maintainers feel about such change, >>>> there needs to be an ulp update (i'm willing to do that for >>>> targets i can access hw for testing). >>> >>> Where there any unanswered questions in your v4 review? >>> >>> Do you think v4 is basically as good as it will get? >>> >>> Who were the people who signed off on the review? >>> >> >> Joseph Myers started the review of both the sinf, cosf, sincosf >> changes and the exp, exp2, log, log2, pow changes. >> >> I think I addressed all of his comments in an acceptable way, >> but i don't know if he had other concerns or if parts of the >> code he has not reviewed yet. >> >> Since the glibc tests pass on 3 different targets (and >> build-many-glibcs.py) i think there is no danger of the >> patch being completely broken. Wilco and I tested the patches >> in detail outside of glibc so it is the glibc integration where >> I expected most of the issues. >> >> I don't expect performance regression on any target, but it >> was not measured e.g. on powerpc (only aarch64 and x86_64) >> which might have different behaviour (previous sincosf was >> optimized on that target hence it might make sense to retest >> the new code to be sure). >> >> I think the patches are in a good quality state now. >> (The ABI changing part needs further work so i didn't post that.) > > built and tested on a power8 machine now, glibc math > tests pass (except for an unrelated fmal failure), > benchmark improvements are consistent with aarch64/x86_64, > but it was a shared access machine so i won't post exact > numbers, sincosf improved a bit too, sinf/cosf didn't > (apparently powerpc has its own implementation). PowerPC sinf/cosf uses the same algorithm used on x86, I presume it would be a gain to generic implementation as well.
On 09/07/18 14:09, Adhemerval Zanella wrote: > On 09/07/2018 09:15, Szabolcs Nagy wrote: >> built and tested on a power8 machine now, glibc math >> tests pass (except for an unrelated fmal failure), >> benchmark improvements are consistent with aarch64/x86_64, >> but it was a shared access machine so i won't post exact >> numbers, sincosf improved a bit too, sinf/cosf didn't >> (apparently powerpc has its own implementation). > > PowerPC sinf/cosf uses the same algorithm used on x86, I presume > it would be a gain to generic implementation as well. > you mean the new implementation would be better or the target specific one? new implementation has better latency on this particular powerpc machine than the target specific code, but throughput is worse sometimes (using the default 0 setting for PREFER_FLOAT_COMPARISON).
On 09/07/2018 10:34, Szabolcs Nagy wrote: > On 09/07/18 14:09, Adhemerval Zanella wrote: >> On 09/07/2018 09:15, Szabolcs Nagy wrote: >>> built and tested on a power8 machine now, glibc math >>> tests pass (except for an unrelated fmal failure), >>> benchmark improvements are consistent with aarch64/x86_64, >>> but it was a shared access machine so i won't post exact >>> numbers, sincosf improved a bit too, sinf/cosf didn't >>> (apparently powerpc has its own implementation). >> >> PowerPC sinf/cosf uses the same algorithm used on x86, I presume >> it would be a gain to generic implementation as well. >> > > you mean the new implementation would be better or the > target specific one? > > new implementation has better latency on this particular > powerpc machine than the target specific code, but > throughput is worse sometimes (using the default 0 > setting for PREFER_FLOAT_COMPARISON). I did not measure, but I would expect. PowerPC uses an different implementation for generic code (s_sinf-ppc64.c) so comparing against it maybe misleading (since it use the old implementation still). I am not sure which compiler you used for evaluation, but at least Ubuntu 16.04 one (gcc 5.4) does not use POWER8 ISA as default and even with -mcpu=power8 it generates subpar code. I will try to check with a GCC 7.1 (but as for your environment, I am using a shared machine, although it I think I might get slight better results because it uses a micro-partition). For PREFER_FLOAT_COMPARISON, do we use this on glibc? I think it is only enabled on optimized-routines, isn't it?
On 09/07/18 15:26, Adhemerval Zanella wrote: > On 09/07/2018 10:34, Szabolcs Nagy wrote: >> On 09/07/18 14:09, Adhemerval Zanella wrote: >>> On 09/07/2018 09:15, Szabolcs Nagy wrote: >>>> built and tested on a power8 machine now, glibc math >>>> tests pass (except for an unrelated fmal failure), >>>> benchmark improvements are consistent with aarch64/x86_64, >>>> but it was a shared access machine so i won't post exact >>>> numbers, sincosf improved a bit too, sinf/cosf didn't >>>> (apparently powerpc has its own implementation). >>> >>> PowerPC sinf/cosf uses the same algorithm used on x86, I presume >>> it would be a gain to generic implementation as well. >>> >> >> you mean the new implementation would be better or the >> target specific one? >> >> new implementation has better latency on this particular >> powerpc machine than the target specific code, but >> throughput is worse sometimes (using the default 0 >> setting for PREFER_FLOAT_COMPARISON). > > I did not measure, but I would expect. PowerPC uses an different > implementation for generic code (s_sinf-ppc64.c) so comparing against > it maybe misleading (since it use the old implementation still). > i'm comparing two glibc builds, they both still use the same (old) code for sinf/cosf so there is nothing misleading. the sincosf code is generic though and the new implementation does show some speedup. > I am not sure which compiler you used for evaluation, but at least > Ubuntu 16.04 one (gcc 5.4) does not use POWER8 ISA as default and > even with -mcpu=power8 it generates subpar code. I will try to > check with a GCC 7.1 (but as for your environment, I am using > a shared machine, although it I think I might get slight better > results because it uses a micro-partition). > i built a gcc 7.3.0 toolchain, the host toolchain would not be able to build glibc (gcc-4.8), same for the host make (3.82). (it's a gcc build farm machine) > For PREFER_FLOAT_COMPARISON, do we use this on glibc? I think > it is only enabled on optimized-routines, isn't it? it is disabled by default, it is there so targets can enable it if float compares are faster than using the representation, currently disabled everywhere in glibc. (i don't want to change that setting now, that case can be tweaked later)
On 09/07/2018 12:41, Szabolcs Nagy wrote: > On 09/07/18 15:26, Adhemerval Zanella wrote: >> On 09/07/2018 10:34, Szabolcs Nagy wrote: >>> On 09/07/18 14:09, Adhemerval Zanella wrote: >>>> On 09/07/2018 09:15, Szabolcs Nagy wrote: >>>>> built and tested on a power8 machine now, glibc math >>>>> tests pass (except for an unrelated fmal failure), >>>>> benchmark improvements are consistent with aarch64/x86_64, >>>>> but it was a shared access machine so i won't post exact >>>>> numbers, sincosf improved a bit too, sinf/cosf didn't >>>>> (apparently powerpc has its own implementation). >>>> >>>> PowerPC sinf/cosf uses the same algorithm used on x86, I presume >>>> it would be a gain to generic implementation as well. >>>> >>> >>> you mean the new implementation would be better or the >>> target specific one? >>> >>> new implementation has better latency on this particular >>> powerpc machine than the target specific code, but >>> throughput is worse sometimes (using the default 0 >>> setting for PREFER_FLOAT_COMPARISON). >> >> I did not measure, but I would expect. PowerPC uses an different >> implementation for generic code (s_sinf-ppc64.c) so comparing against >> it maybe misleading (since it use the old implementation still). >> > > i'm comparing two glibc builds, they both still use the > same (old) code for sinf/cosf so there is nothing misleading. I meant comparing the generic s_sinf against powerpc's one (since default ifunc selection for powerpc is not the generic is not sysdeps/ieee754/flt-32/s_sinf.c). But indeed this is not for this case, sorry for the noise. > > the sincosf code is generic though and the new implementation > does show some speedup. > >> I am not sure which compiler you used for evaluation, but at least >> Ubuntu 16.04 one (gcc 5.4) does not use POWER8 ISA as default and >> even with -mcpu=power8 it generates subpar code. I will try to >> check with a GCC 7.1 (but as for your environment, I am using >> a shared machine, although it I think I might get slight better >> results because it uses a micro-partition). >> > > i built a gcc 7.3.0 toolchain, the host toolchain would > not be able to build glibc (gcc-4.8), same for the host > make (3.82). (it's a gcc build farm machine) Using glibc benchmarks, I also see better results with power8 implementation: s_sinf-power8: "sinf": { "": { "duration": 5.12725e+09, "iterations": 7.03494e+08, "max": 983.08, "min": 6.06, "mean": 7.28827 } } s_sinf-ppc64: "sinf": { "": { "duration": 5.13064e+09, "iterations": 1.86048e+08, "max": 1032.52, "min": 8.035, "mean": 27.577 } } generic s_sinf: "sinf": { "": { "duration": 5.12404e+09, "iterations": 6.74424e+08, "max": 515.97, "min": 6.089, "mean": 7.59765 } } One remark is I think we can get rid of generic powerpc sinf (sysdeps/powerpc/fpu/s_sinf.c) and use generic implementation instead. > >> For PREFER_FLOAT_COMPARISON, do we use this on glibc? I think >> it is only enabled on optimized-routines, isn't it? > > it is disabled by default, it is there so targets can enable > it if float compares are faster than using the representation, > currently disabled everywhere in glibc. > (i don't want to change that setting now, that case can > be tweaked later)
On 06/07/18 17:27, Carlos O'Donell wrote: > On 07/06/2018 11:46 AM, Szabolcs Nagy wrote: >> On 06/07/18 13:43, Carlos O'Donell wrote: >>> On 07/06/2018 04:47 AM, Szabolcs Nagy wrote: >>>> Optimized exp, exp2, log, log2, pow, sinf, cosf and sincosf >>>> implementations. >>> >>> Is it your intent to have these included in 2.28? >>> >> >> (resending as my previous mail seems to be lost) >> >> yes, i'd like to add it to the 'desirable in 2.28' list >> if Joseph is ok with the code, but i see he is not available >> right now for review. >> >> i don't know how other maintainers feel about such change, >> there needs to be an ulp update (i'm willing to do that for >> targets i can access hw for testing). > > Where there any unanswered questions in your v4 review? > > Do you think v4 is basically as good as it will get? > > Who were the people who signed off on the review? > i made two minor modifications (codegen is not affected): - removed an unused configuration option from sincosf. - fixed a comment in pow. i ran glibc tests on powerpc64le-linux-gnu and i686-linux-gnu too now. i will also update the improvement numbers in the commit messages to reflect dynamic linked master vs new instead of static linked master vs nowrapper improvements (i didn't repost the patchset just to change those numbers though). i propose this change to be included in glibc 2.28 (added it to the desired features), i'm willing to do further tweaks if necessary. i believe the code provides the documented math quality guarantees and does not introduce regressions on any target (other than ulp bound update is needed after the exp patch). the wrapper removal is deferred to glibc 2.29 (it requires some changes to the libm alias machinery, it changes abi and target maintainers may need to tweak the ifunc mechanism where appropriate for optimal performance)
On Fri, 6 Jul 2018, Szabolcs Nagy wrote: > I think I addressed all of his comments in an acceptable way, > but i don't know if he had other concerns or if parts of the > code he has not reviewed yet. The missing information in comments in previous versions effectively prevented review of substantial parts of the code in those versions; comments are critical information about the interfaces between different parts of the code that are required for effective review (as well as for subsequent maintainability of the code). I think these changes are far too high risk for this late in the freeze.
On 07/17/2018 05:59 PM, Joseph Myers wrote: > On Fri, 6 Jul 2018, Szabolcs Nagy wrote: > >> I think I addressed all of his comments in an acceptable way, >> but i don't know if he had other concerns or if parts of the >> code he has not reviewed yet. > > The missing information in comments in previous versions effectively > prevented review of substantial parts of the code in those versions; > comments are critical information about the interfaces between different > parts of the code that are required for effective review (as well as for > subsequent maintainability of the code). > > I think these changes are far too high risk for this late in the freeze. Thanks for that review. In which case let us defer to 2.29.
On 06/07/18 09:47, Szabolcs Nagy wrote: > Optimized exp, exp2, log, log2, pow, sinf, cosf and sincosf > implementations. > > I updated the patches according to comments. I included the > sincosf patches from Wilco, but split out the math_config.h > toint changes into a separate patch. > > This set does not include the wrapper removal that changes ABI. > I'm still testing those and they can be applied separately. > The new exp increases the ulp error bounds in non-nearest > rounding modes, so ulp updates are needed. > > Szabolcs Nagy (8): > Clean up converttoint handling and document the semantics > Add new exp and exp2 implementations > aarch64: update libm-test-ulps > arm: update libm-test-ulps > x86_64: update libm-test-ulps > Add new log implementation > Add new log2 implementation > Add new pow implementation > > Wilco Dijkstra (2): > Improve performance of sincosf > Improve performance of sinf and cosf > ping.