Message ID | 9e5636e8-224f-4dde-9c60-489b4aad6534@gmail.com |
---|---|
State | New |
Headers | show |
Series | libstdc++: Optimize std::gcd | expand |
Stephen Face <shpface@gmail.com> writes: > This patch is to optimize the runtime execution of gcd. Mathematically, > it computes with the same algorithm as before, but subtractions and > branches are rearranged to encourage generation of code that can use > flags from the subtractions for conditional moves. Additionally, most > pairs of integers are coprime, so this patch also includes a check for > one of the integers to be equal to 1, and then it will exit the loop > early in this case. Is it worth filing a bug for the missed optimisation? You shouldn't have to write things in a specific order. Thanks. > > libstdc++-v3/ChangeLog: > > * include/std/numeric(__gcd): Optimize. > --- > I have tested this on x86_64-linux and aarch64-linux. I have tested the > timing with random distributions of small inputs and large inputs on a > couple of machines with -O2 and found decreases in execution time from > 20% to 60% depending on the machine and distribution of inputs. > > libstdc++-v3/include/std/numeric | 21 +++++++++++---------- > 1 file changed, 11 insertions(+), 10 deletions(-) > > diff --git a/libstdc++-v3/include/std/numeric b/libstdc++-v3/include/std/numeric > index c912db4a519..3c9e8387a0e 100644 > --- a/libstdc++-v3/include/std/numeric > +++ b/libstdc++-v3/include/std/numeric > @@ -148,19 +148,20 @@ namespace __detail > > while (true) > { > - if (__m > __n) > - { > - _Tp __tmp = __m; > - __m = __n; > - __n = __tmp; > - } > + _Tp __m_minus_n = __m - __n; > + if (__m_minus_n == 0) > + return __m << __k; > > - __n -= __m; > + _Tp __next_m = __m < __n ? __m : __n; > > - if (__n == 0) > - return __m << __k; > + if (__next_m == 1) > + return __next_m << __k; > + > + _Tp __n_minus_m = __n - __m; > + __n = __n < __m ? __m_minus_n : __n_minus_m; > + __m = __next_m; > > - __n >>= std::__countr_zero(__n); > + __n >>= std::__countr_zero(__m_minus_n); > } > } > } // namespace __detail
On Fri, 7 Jun 2024 at 09:57, Sam James <sam@gentoo.org> wrote: > > Stephen Face <shpface@gmail.com> writes: > > > This patch is to optimize the runtime execution of gcd. Mathematically, > > it computes with the same algorithm as before, but subtractions and > > branches are rearranged to encourage generation of code that can use > > flags from the subtractions for conditional moves. Additionally, most > > pairs of integers are coprime, so this patch also includes a check for > > one of the integers to be equal to 1, and then it will exit the loop > > early in this case. > > Is it worth filing a bug for the missed optimisation? You shouldn't have > to write things in a specific order. Thanks. Yes, I think a bug report would be good. But 20%-60% decreases in run time seems significant enough that we should take the libstdc++ patch now, rather than wait for a possible compiler fix to come later. Stephen, could you please confirm whether you have a copyright assignment in place for GCC, and if not whether you would be will to complete one, or alternatively contribute this under the DCO terms: https://gcc.gnu.org/dco.html Thanks! > > > > > libstdc++-v3/ChangeLog: > > > > * include/std/numeric(__gcd): Optimize. > > --- > > I have tested this on x86_64-linux and aarch64-linux. I have tested the > > timing with random distributions of small inputs and large inputs on a > > couple of machines with -O2 and found decreases in execution time from > > 20% to 60% depending on the machine and distribution of inputs. > > > > libstdc++-v3/include/std/numeric | 21 +++++++++++---------- > > 1 file changed, 11 insertions(+), 10 deletions(-) > > > > diff --git a/libstdc++-v3/include/std/numeric b/libstdc++-v3/include/std/numeric > > index c912db4a519..3c9e8387a0e 100644 > > --- a/libstdc++-v3/include/std/numeric > > +++ b/libstdc++-v3/include/std/numeric > > @@ -148,19 +148,20 @@ namespace __detail > > > > while (true) > > { > > - if (__m > __n) > > - { > > - _Tp __tmp = __m; > > - __m = __n; > > - __n = __tmp; > > - } > > + _Tp __m_minus_n = __m - __n; > > + if (__m_minus_n == 0) > > + return __m << __k; > > > > - __n -= __m; > > + _Tp __next_m = __m < __n ? __m : __n; > > > > - if (__n == 0) > > - return __m << __k; > > + if (__next_m == 1) > > + return __next_m << __k; > > + > > + _Tp __n_minus_m = __n - __m; > > + __n = __n < __m ? __m_minus_n : __n_minus_m; > > + __m = __next_m; > > > > - __n >>= std::__countr_zero(__n); > > + __n >>= std::__countr_zero(__m_minus_n); > > } > > } > > } // namespace __detail >
On 6/7/24 2:30 AM, Jonathan Wakely wrote: > On Fri, 7 Jun 2024 at 09:57, Sam James <sam@gentoo.org> wrote: >> >> Stephen Face <shpface@gmail.com> writes: >> >>> This patch is to optimize the runtime execution of gcd. Mathematically, >>> it computes with the same algorithm as before, but subtractions and >>> branches are rearranged to encourage generation of code that can use >>> flags from the subtractions for conditional moves. Additionally, most >>> pairs of integers are coprime, so this patch also includes a check for >>> one of the integers to be equal to 1, and then it will exit the loop >>> early in this case. >> >> Is it worth filing a bug for the missed optimisation? You shouldn't have >> to write things in a specific order. Thanks. This patch is not a pure reordering. It has the added comparison with 1. Also, the argument to the trailing zero count is now the result of the subtraction before the comparison of m and n. This shortens the chain of dependencies, which can help the loop run faster. > > Yes, I think a bug report would be good. But 20%-60% decreases in run > time seems significant enough that we should take the libstdc++ patch > now, rather than wait for a possible compiler fix to come later. > > Stephen, could you please confirm whether you have a copyright > assignment in place for GCC, and if not whether you would be will to > complete one, or alternatively contribute this under the DCO terms: > https://gcc.gnu.org/dco.html > Thanks! I do not have a copyright assignment in place for GCC. I can certify to the DCO terms for this contribution. Signed-off-by: Stephen Face <shpface@gmail.com> > > >> >>> >>> libstdc++-v3/ChangeLog: >>> >>> * include/std/numeric(__gcd): Optimize. >>> --- >>> I have tested this on x86_64-linux and aarch64-linux. I have tested the >>> timing with random distributions of small inputs and large inputs on a >>> couple of machines with -O2 and found decreases in execution time from >>> 20% to 60% depending on the machine and distribution of inputs. >>> >>> libstdc++-v3/include/std/numeric | 21 +++++++++++---------- >>> 1 file changed, 11 insertions(+), 10 deletions(-) >>> >>> diff --git a/libstdc++-v3/include/std/numeric b/libstdc++-v3/include/std/numeric >>> index c912db4a519..3c9e8387a0e 100644 >>> --- a/libstdc++-v3/include/std/numeric >>> +++ b/libstdc++-v3/include/std/numeric >>> @@ -148,19 +148,20 @@ namespace __detail >>> >>> while (true) >>> { >>> - if (__m > __n) >>> - { >>> - _Tp __tmp = __m; >>> - __m = __n; >>> - __n = __tmp; >>> - } >>> + _Tp __m_minus_n = __m - __n; >>> + if (__m_minus_n == 0) >>> + return __m << __k; >>> >>> - __n -= __m; >>> + _Tp __next_m = __m < __n ? __m : __n; >>> >>> - if (__n == 0) >>> - return __m << __k; >>> + if (__next_m == 1) >>> + return __next_m << __k; >>> + >>> + _Tp __n_minus_m = __n - __m; >>> + __n = __n < __m ? __m_minus_n : __n_minus_m; >>> + __m = __next_m; >>> >>> - __n >>= std::__countr_zero(__n); >>> + __n >>= std::__countr_zero(__m_minus_n); >>> } >>> } >>> } // namespace __detail >> >
On Fri, 7 Jun 2024 at 19:42, Stephen Face <shpface@gmail.com> wrote: > > On 6/7/24 2:30 AM, Jonathan Wakely wrote: > > On Fri, 7 Jun 2024 at 09:57, Sam James <sam@gentoo.org> wrote: > >> > >> Stephen Face <shpface@gmail.com> writes: > >> > >>> This patch is to optimize the runtime execution of gcd. Mathematically, > >>> it computes with the same algorithm as before, but subtractions and > >>> branches are rearranged to encourage generation of code that can use > >>> flags from the subtractions for conditional moves. Additionally, most > >>> pairs of integers are coprime, so this patch also includes a check for > >>> one of the integers to be equal to 1, and then it will exit the loop > >>> early in this case. > >> > >> Is it worth filing a bug for the missed optimisation? You shouldn't have > >> to write things in a specific order. Thanks. > > This patch is not a pure reordering. It has the added comparison with 1. > Also, the argument to the trailing zero count is now the result of the > subtraction before the comparison of m and n. This shortens the chain of > dependencies, which can help the loop run faster. > > > > > Yes, I think a bug report would be good. But 20%-60% decreases in run > > time seems significant enough that we should take the libstdc++ patch > > now, rather than wait for a possible compiler fix to come later. > > > > Stephen, could you please confirm whether you have a copyright > > assignment in place for GCC, and if not whether you would be will to > > complete one, or alternatively contribute this under the DCO terms: > > https://gcc.gnu.org/dco.html > > Thanks! > > I do not have a copyright assignment in place for GCC. I can certify to > the DCO terms for this contribution. > > Signed-off-by: Stephen Face <shpface@gmail.com> Thank you! I'll take care of this patch next week. > > > > > > >> > >>> > >>> libstdc++-v3/ChangeLog: > >>> > >>> * include/std/numeric(__gcd): Optimize. > >>> --- > >>> I have tested this on x86_64-linux and aarch64-linux. I have tested the > >>> timing with random distributions of small inputs and large inputs on a > >>> couple of machines with -O2 and found decreases in execution time from > >>> 20% to 60% depending on the machine and distribution of inputs. > >>> > >>> libstdc++-v3/include/std/numeric | 21 +++++++++++---------- > >>> 1 file changed, 11 insertions(+), 10 deletions(-) > >>> > >>> diff --git a/libstdc++-v3/include/std/numeric b/libstdc++-v3/include/std/numeric > >>> index c912db4a519..3c9e8387a0e 100644 > >>> --- a/libstdc++-v3/include/std/numeric > >>> +++ b/libstdc++-v3/include/std/numeric > >>> @@ -148,19 +148,20 @@ namespace __detail > >>> > >>> while (true) > >>> { > >>> - if (__m > __n) > >>> - { > >>> - _Tp __tmp = __m; > >>> - __m = __n; > >>> - __n = __tmp; > >>> - } > >>> + _Tp __m_minus_n = __m - __n; > >>> + if (__m_minus_n == 0) > >>> + return __m << __k; > >>> > >>> - __n -= __m; > >>> + _Tp __next_m = __m < __n ? __m : __n; > >>> > >>> - if (__n == 0) > >>> - return __m << __k; > >>> + if (__next_m == 1) > >>> + return __next_m << __k; > >>> + > >>> + _Tp __n_minus_m = __n - __m; > >>> + __n = __n < __m ? __m_minus_n : __n_minus_m; > >>> + __m = __next_m; > >>> > >>> - __n >>= std::__countr_zero(__n); > >>> + __n >>= std::__countr_zero(__m_minus_n); > >>> } > >>> } > >>> } // namespace __detail > >> > > >
diff --git a/libstdc++-v3/include/std/numeric b/libstdc++-v3/include/std/numeric index c912db4a519..3c9e8387a0e 100644 --- a/libstdc++-v3/include/std/numeric +++ b/libstdc++-v3/include/std/numeric @@ -148,19 +148,20 @@ namespace __detail while (true) { - if (__m > __n) - { - _Tp __tmp = __m; - __m = __n; - __n = __tmp; - } + _Tp __m_minus_n = __m - __n; + if (__m_minus_n == 0) + return __m << __k; - __n -= __m; + _Tp __next_m = __m < __n ? __m : __n; - if (__n == 0) - return __m << __k; + if (__next_m == 1) + return __next_m << __k; + + _Tp __n_minus_m = __n - __m; + __n = __n < __m ? __m_minus_n : __n_minus_m; + __m = __next_m; - __n >>= std::__countr_zero(__n); + __n >>= std::__countr_zero(__m_minus_n); } } } // namespace __detail