diff mbox series

libstdc++: Optimize std::gcd

Message ID 9e5636e8-224f-4dde-9c60-489b4aad6534@gmail.com
State New
Headers show
Series libstdc++: Optimize std::gcd | expand

Commit Message

Stephen Face June 7, 2024, 5:29 a.m. UTC
This patch is to optimize the runtime execution of gcd. Mathematically,
it computes with the same algorithm as before, but subtractions and
branches are rearranged to encourage generation of code that can use
flags from the subtractions for conditional moves. Additionally, most
pairs of integers are coprime, so this patch also includes a check for
one of the integers to be equal to 1, and then it will exit the loop
early in this case.

libstdc++-v3/ChangeLog:

	* include/std/numeric(__gcd): Optimize.
---
I have tested this on x86_64-linux and aarch64-linux. I have tested the
timing with random distributions of small inputs and large inputs on a
couple of machines with -O2 and found decreases in execution time from
20% to 60% depending on the machine and distribution of inputs.

 libstdc++-v3/include/std/numeric | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

Comments

Sam James June 7, 2024, 8:57 a.m. UTC | #1
Stephen Face <shpface@gmail.com> writes:

> This patch is to optimize the runtime execution of gcd. Mathematically,
> it computes with the same algorithm as before, but subtractions and
> branches are rearranged to encourage generation of code that can use
> flags from the subtractions for conditional moves. Additionally, most
> pairs of integers are coprime, so this patch also includes a check for
> one of the integers to be equal to 1, and then it will exit the loop
> early in this case.

Is it worth filing a bug for the missed optimisation? You shouldn't have
to write things in a specific order. Thanks.

>
> libstdc++-v3/ChangeLog:
>
> 	* include/std/numeric(__gcd): Optimize.
> ---
> I have tested this on x86_64-linux and aarch64-linux. I have tested the
> timing with random distributions of small inputs and large inputs on a
> couple of machines with -O2 and found decreases in execution time from
> 20% to 60% depending on the machine and distribution of inputs.
>
>  libstdc++-v3/include/std/numeric | 21 +++++++++++----------
>  1 file changed, 11 insertions(+), 10 deletions(-)
>
> diff --git a/libstdc++-v3/include/std/numeric b/libstdc++-v3/include/std/numeric
> index c912db4a519..3c9e8387a0e 100644
> --- a/libstdc++-v3/include/std/numeric
> +++ b/libstdc++-v3/include/std/numeric
> @@ -148,19 +148,20 @@ namespace __detail
>  
>        while (true)
>  	{
> -	  if (__m > __n)
> -	    {
> -	      _Tp __tmp = __m;
> -	      __m = __n;
> -	      __n = __tmp;
> -	    }
> +	  _Tp __m_minus_n = __m - __n;
> +	  if (__m_minus_n == 0)
> +	    return __m << __k;
>  
> -	  __n -= __m;
> +	  _Tp __next_m = __m < __n ? __m : __n;
>  
> -	  if (__n == 0)
> -	    return __m << __k;
> +	  if (__next_m == 1)
> +	    return __next_m << __k;
> +
> +	  _Tp __n_minus_m = __n - __m;
> +	  __n = __n < __m ? __m_minus_n : __n_minus_m;
> +	  __m = __next_m;
>  
> -	  __n >>= std::__countr_zero(__n);
> +	  __n >>= std::__countr_zero(__m_minus_n);
>  	}
>      }
>  } // namespace __detail
Jonathan Wakely June 7, 2024, 9:30 a.m. UTC | #2
On Fri, 7 Jun 2024 at 09:57, Sam James <sam@gentoo.org> wrote:
>
> Stephen Face <shpface@gmail.com> writes:
>
> > This patch is to optimize the runtime execution of gcd. Mathematically,
> > it computes with the same algorithm as before, but subtractions and
> > branches are rearranged to encourage generation of code that can use
> > flags from the subtractions for conditional moves. Additionally, most
> > pairs of integers are coprime, so this patch also includes a check for
> > one of the integers to be equal to 1, and then it will exit the loop
> > early in this case.
>
> Is it worth filing a bug for the missed optimisation? You shouldn't have
> to write things in a specific order. Thanks.

Yes, I think a bug report would be good. But 20%-60% decreases in run
time seems significant enough that we should take the libstdc++ patch
now, rather than wait for a possible compiler fix to come later.

Stephen, could you please confirm whether you have a copyright
assignment in place for GCC, and if not whether you would be will to
complete one, or alternatively contribute this under the DCO terms:
https://gcc.gnu.org/dco.html
Thanks!


>
> >
> > libstdc++-v3/ChangeLog:
> >
> >       * include/std/numeric(__gcd): Optimize.
> > ---
> > I have tested this on x86_64-linux and aarch64-linux. I have tested the
> > timing with random distributions of small inputs and large inputs on a
> > couple of machines with -O2 and found decreases in execution time from
> > 20% to 60% depending on the machine and distribution of inputs.
> >
> >  libstdc++-v3/include/std/numeric | 21 +++++++++++----------
> >  1 file changed, 11 insertions(+), 10 deletions(-)
> >
> > diff --git a/libstdc++-v3/include/std/numeric b/libstdc++-v3/include/std/numeric
> > index c912db4a519..3c9e8387a0e 100644
> > --- a/libstdc++-v3/include/std/numeric
> > +++ b/libstdc++-v3/include/std/numeric
> > @@ -148,19 +148,20 @@ namespace __detail
> >
> >        while (true)
> >       {
> > -       if (__m > __n)
> > -         {
> > -           _Tp __tmp = __m;
> > -           __m = __n;
> > -           __n = __tmp;
> > -         }
> > +       _Tp __m_minus_n = __m - __n;
> > +       if (__m_minus_n == 0)
> > +         return __m << __k;
> >
> > -       __n -= __m;
> > +       _Tp __next_m = __m < __n ? __m : __n;
> >
> > -       if (__n == 0)
> > -         return __m << __k;
> > +       if (__next_m == 1)
> > +         return __next_m << __k;
> > +
> > +       _Tp __n_minus_m = __n - __m;
> > +       __n = __n < __m ? __m_minus_n : __n_minus_m;
> > +       __m = __next_m;
> >
> > -       __n >>= std::__countr_zero(__n);
> > +       __n >>= std::__countr_zero(__m_minus_n);
> >       }
> >      }
> >  } // namespace __detail
>
Stephen Face June 7, 2024, 6:42 p.m. UTC | #3
On 6/7/24 2:30 AM, Jonathan Wakely wrote:
> On Fri, 7 Jun 2024 at 09:57, Sam James <sam@gentoo.org> wrote:
>>
>> Stephen Face <shpface@gmail.com> writes:
>>
>>> This patch is to optimize the runtime execution of gcd. Mathematically,
>>> it computes with the same algorithm as before, but subtractions and
>>> branches are rearranged to encourage generation of code that can use
>>> flags from the subtractions for conditional moves. Additionally, most
>>> pairs of integers are coprime, so this patch also includes a check for
>>> one of the integers to be equal to 1, and then it will exit the loop
>>> early in this case.
>>
>> Is it worth filing a bug for the missed optimisation? You shouldn't have
>> to write things in a specific order. Thanks.

This patch is not a pure reordering. It has the added comparison with 1.
Also, the argument to the trailing zero count is now the result of the
subtraction before the comparison of m and n. This shortens the chain of
dependencies, which can help the loop run faster.

> 
> Yes, I think a bug report would be good. But 20%-60% decreases in run
> time seems significant enough that we should take the libstdc++ patch
> now, rather than wait for a possible compiler fix to come later.
> 
> Stephen, could you please confirm whether you have a copyright
> assignment in place for GCC, and if not whether you would be will to
> complete one, or alternatively contribute this under the DCO terms:
> https://gcc.gnu.org/dco.html
> Thanks!

I do not have a copyright assignment in place for GCC. I can certify to
the DCO terms for this contribution.

    Signed-off-by: Stephen Face <shpface@gmail.com>

> 
> 
>>
>>>
>>> libstdc++-v3/ChangeLog:
>>>
>>>       * include/std/numeric(__gcd): Optimize.
>>> ---
>>> I have tested this on x86_64-linux and aarch64-linux. I have tested the
>>> timing with random distributions of small inputs and large inputs on a
>>> couple of machines with -O2 and found decreases in execution time from
>>> 20% to 60% depending on the machine and distribution of inputs.
>>>
>>>  libstdc++-v3/include/std/numeric | 21 +++++++++++----------
>>>  1 file changed, 11 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/libstdc++-v3/include/std/numeric b/libstdc++-v3/include/std/numeric
>>> index c912db4a519..3c9e8387a0e 100644
>>> --- a/libstdc++-v3/include/std/numeric
>>> +++ b/libstdc++-v3/include/std/numeric
>>> @@ -148,19 +148,20 @@ namespace __detail
>>>
>>>        while (true)
>>>       {
>>> -       if (__m > __n)
>>> -         {
>>> -           _Tp __tmp = __m;
>>> -           __m = __n;
>>> -           __n = __tmp;
>>> -         }
>>> +       _Tp __m_minus_n = __m - __n;
>>> +       if (__m_minus_n == 0)
>>> +         return __m << __k;
>>>
>>> -       __n -= __m;
>>> +       _Tp __next_m = __m < __n ? __m : __n;
>>>
>>> -       if (__n == 0)
>>> -         return __m << __k;
>>> +       if (__next_m == 1)
>>> +         return __next_m << __k;
>>> +
>>> +       _Tp __n_minus_m = __n - __m;
>>> +       __n = __n < __m ? __m_minus_n : __n_minus_m;
>>> +       __m = __next_m;
>>>
>>> -       __n >>= std::__countr_zero(__n);
>>> +       __n >>= std::__countr_zero(__m_minus_n);
>>>       }
>>>      }
>>>  } // namespace __detail
>>
>
Jonathan Wakely June 7, 2024, 6:48 p.m. UTC | #4
On Fri, 7 Jun 2024 at 19:42, Stephen Face <shpface@gmail.com> wrote:
>
> On 6/7/24 2:30 AM, Jonathan Wakely wrote:
> > On Fri, 7 Jun 2024 at 09:57, Sam James <sam@gentoo.org> wrote:
> >>
> >> Stephen Face <shpface@gmail.com> writes:
> >>
> >>> This patch is to optimize the runtime execution of gcd. Mathematically,
> >>> it computes with the same algorithm as before, but subtractions and
> >>> branches are rearranged to encourage generation of code that can use
> >>> flags from the subtractions for conditional moves. Additionally, most
> >>> pairs of integers are coprime, so this patch also includes a check for
> >>> one of the integers to be equal to 1, and then it will exit the loop
> >>> early in this case.
> >>
> >> Is it worth filing a bug for the missed optimisation? You shouldn't have
> >> to write things in a specific order. Thanks.
>
> This patch is not a pure reordering. It has the added comparison with 1.
> Also, the argument to the trailing zero count is now the result of the
> subtraction before the comparison of m and n. This shortens the chain of
> dependencies, which can help the loop run faster.
>
> >
> > Yes, I think a bug report would be good. But 20%-60% decreases in run
> > time seems significant enough that we should take the libstdc++ patch
> > now, rather than wait for a possible compiler fix to come later.
> >
> > Stephen, could you please confirm whether you have a copyright
> > assignment in place for GCC, and if not whether you would be will to
> > complete one, or alternatively contribute this under the DCO terms:
> > https://gcc.gnu.org/dco.html
> > Thanks!
>
> I do not have a copyright assignment in place for GCC. I can certify to
> the DCO terms for this contribution.
>
>     Signed-off-by: Stephen Face <shpface@gmail.com>

Thank you! I'll take care of this patch next week.


>
> >
> >
> >>
> >>>
> >>> libstdc++-v3/ChangeLog:
> >>>
> >>>       * include/std/numeric(__gcd): Optimize.
> >>> ---
> >>> I have tested this on x86_64-linux and aarch64-linux. I have tested the
> >>> timing with random distributions of small inputs and large inputs on a
> >>> couple of machines with -O2 and found decreases in execution time from
> >>> 20% to 60% depending on the machine and distribution of inputs.
> >>>
> >>>  libstdc++-v3/include/std/numeric | 21 +++++++++++----------
> >>>  1 file changed, 11 insertions(+), 10 deletions(-)
> >>>
> >>> diff --git a/libstdc++-v3/include/std/numeric b/libstdc++-v3/include/std/numeric
> >>> index c912db4a519..3c9e8387a0e 100644
> >>> --- a/libstdc++-v3/include/std/numeric
> >>> +++ b/libstdc++-v3/include/std/numeric
> >>> @@ -148,19 +148,20 @@ namespace __detail
> >>>
> >>>        while (true)
> >>>       {
> >>> -       if (__m > __n)
> >>> -         {
> >>> -           _Tp __tmp = __m;
> >>> -           __m = __n;
> >>> -           __n = __tmp;
> >>> -         }
> >>> +       _Tp __m_minus_n = __m - __n;
> >>> +       if (__m_minus_n == 0)
> >>> +         return __m << __k;
> >>>
> >>> -       __n -= __m;
> >>> +       _Tp __next_m = __m < __n ? __m : __n;
> >>>
> >>> -       if (__n == 0)
> >>> -         return __m << __k;
> >>> +       if (__next_m == 1)
> >>> +         return __next_m << __k;
> >>> +
> >>> +       _Tp __n_minus_m = __n - __m;
> >>> +       __n = __n < __m ? __m_minus_n : __n_minus_m;
> >>> +       __m = __next_m;
> >>>
> >>> -       __n >>= std::__countr_zero(__n);
> >>> +       __n >>= std::__countr_zero(__m_minus_n);
> >>>       }
> >>>      }
> >>>  } // namespace __detail
> >>
> >
>
diff mbox series

Patch

diff --git a/libstdc++-v3/include/std/numeric b/libstdc++-v3/include/std/numeric
index c912db4a519..3c9e8387a0e 100644
--- a/libstdc++-v3/include/std/numeric
+++ b/libstdc++-v3/include/std/numeric
@@ -148,19 +148,20 @@  namespace __detail
 
       while (true)
 	{
-	  if (__m > __n)
-	    {
-	      _Tp __tmp = __m;
-	      __m = __n;
-	      __n = __tmp;
-	    }
+	  _Tp __m_minus_n = __m - __n;
+	  if (__m_minus_n == 0)
+	    return __m << __k;
 
-	  __n -= __m;
+	  _Tp __next_m = __m < __n ? __m : __n;
 
-	  if (__n == 0)
-	    return __m << __k;
+	  if (__next_m == 1)
+	    return __next_m << __k;
+
+	  _Tp __n_minus_m = __n - __m;
+	  __n = __n < __m ? __m_minus_n : __n_minus_m;
+	  __m = __next_m;
 
-	  __n >>= std::__countr_zero(__n);
+	  __n >>= std::__countr_zero(__m_minus_n);
 	}
     }
 } // namespace __detail