diff mbox series

x86: Enable non-temporal memset tunable for AMD

Message ID 20240607230447.52478-1-jdamato@fastly.com
State New
Headers show
Series x86: Enable non-temporal memset tunable for AMD | expand

Commit Message

Joe Damato June 7, 2024, 11:04 p.m. UTC
In commit 46b5e98ef6f1 ("x86: Add seperate non-temporal tunable for
memset") a tunable threshold for enabling non-temporal memset was added,
but only for Intel hardware.

Since that commit, new benchmark results suggest that non-temporal
memset is beneficial on AMD, as well, so allow this tunable to be set
for AMD.

See:
https://docs.google.com/spreadsheets/d/1opzukzvum4n6-RUVHTGddV6RjAEil4P2uMjjQGLbLcU/edit?usp=sharing
which has been updated to include data using different stategies for
large memset on AMD Zen2, Zen3, and Zen4.

Signed-off-by: Joe Damato <jdamato@fastly.com>
---
 sysdeps/x86/dl-cacheinfo.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Comments

Noah Goldstein June 8, 2024, 7:10 p.m. UTC | #1
On Fri, Jun 7, 2024 at 6:04 PM Joe Damato <jdamato@fastly.com> wrote:
>
> In commit 46b5e98ef6f1 ("x86: Add seperate non-temporal tunable for
> memset") a tunable threshold for enabling non-temporal memset was added,
> but only for Intel hardware.
>
> Since that commit, new benchmark results suggest that non-temporal
> memset is beneficial on AMD, as well, so allow this tunable to be set
> for AMD.
>
> See:
> https://docs.google.com/spreadsheets/d/1opzukzvum4n6-RUVHTGddV6RjAEil4P2uMjjQGLbLcU/edit?usp=sharing
> which has been updated to include data using different stategies for
> large memset on AMD Zen2, Zen3, and Zen4.
>
> Signed-off-by: Joe Damato <jdamato@fastly.com>
> ---
>  sysdeps/x86/dl-cacheinfo.h | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/sysdeps/x86/dl-cacheinfo.h b/sysdeps/x86/dl-cacheinfo.h
> index d375a7cba6..d2fe61b997 100644
> --- a/sysdeps/x86/dl-cacheinfo.h
> +++ b/sysdeps/x86/dl-cacheinfo.h
> @@ -986,11 +986,11 @@ dl_init_cacheinfo (struct cpu_features *cpu_features)
>    if (CPU_FEATURE_USABLE_P (cpu_features, FSRM))
>      rep_movsb_threshold = 2112;
>
> -  /* Non-temporal stores in memset have only been tested on Intel hardware.
> -     Until we benchmark data on other x86 processor, disable non-temporal
> -     stores in memset. */
> +  /* Non-temporal stores are more performant on Intel and AMD hardware above
> +     non_temporal_threshold. Enable this for both Intel and AMD hardware. */
>    unsigned long int memset_non_temporal_threshold = SIZE_MAX;
> -  if (cpu_features->basic.kind == arch_kind_intel)
> +  if (cpu_features->basic.kind == arch_kind_intel
> +      || cpu_features->basic.kind == arch_kind_amd)
>        memset_non_temporal_threshold = non_temporal_threshold;
>
>     /* For AMD CPUs that support ERMS (Zen3+), REP MOVSB is in a lot of
> --
> 2.25.1
>

LGTM.

Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
Carlos O'Donell June 10, 2024, 1:17 p.m. UTC | #2
On 6/8/24 3:10 PM, Noah Goldstein wrote:
> On Fri, Jun 7, 2024 at 6:04 PM Joe Damato <jdamato@fastly.com> wrote:
>>
>> In commit 46b5e98ef6f1 ("x86: Add seperate non-temporal tunable for
>> memset") a tunable threshold for enabling non-temporal memset was added,
>> but only for Intel hardware.
>>
>> Since that commit, new benchmark results suggest that non-temporal
>> memset is beneficial on AMD, as well, so allow this tunable to be set
>> for AMD.
>>
>> See:
>> https://docs.google.com/spreadsheets/d/1opzukzvum4n6-RUVHTGddV6RjAEil4P2uMjjQGLbLcU/edit?usp=sharing
>> which has been updated to include data using different stategies for
>> large memset on AMD Zen2, Zen3, and Zen4.
>>
>> Signed-off-by: Joe Damato <jdamato@fastly.com>
>> ---
>>  sysdeps/x86/dl-cacheinfo.h | 8 ++++----
>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/sysdeps/x86/dl-cacheinfo.h b/sysdeps/x86/dl-cacheinfo.h
>> index d375a7cba6..d2fe61b997 100644
>> --- a/sysdeps/x86/dl-cacheinfo.h
>> +++ b/sysdeps/x86/dl-cacheinfo.h
>> @@ -986,11 +986,11 @@ dl_init_cacheinfo (struct cpu_features *cpu_features)
>>    if (CPU_FEATURE_USABLE_P (cpu_features, FSRM))
>>      rep_movsb_threshold = 2112;
>>
>> -  /* Non-temporal stores in memset have only been tested on Intel hardware.
>> -     Until we benchmark data on other x86 processor, disable non-temporal
>> -     stores in memset. */
>> +  /* Non-temporal stores are more performant on Intel and AMD hardware above
>> +     non_temporal_threshold. Enable this for both Intel and AMD hardware. */
>>    unsigned long int memset_non_temporal_threshold = SIZE_MAX;
>> -  if (cpu_features->basic.kind == arch_kind_intel)
>> +  if (cpu_features->basic.kind == arch_kind_intel
>> +      || cpu_features->basic.kind == arch_kind_amd)
>>        memset_non_temporal_threshold = non_temporal_threshold;
>>
>>     /* For AMD CPUs that support ERMS (Zen3+), REP MOVSB is in a lot of
>> --
>> 2.25.1
>>
> 
> LGTM.
> 
> Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
> 

Noah, This passes all CI/CD, please feel free to push this with your RB line added.
Borislav Petkov June 19, 2024, 3:43 p.m. UTC | #3
On Fri, Jun 07, 2024 at 11:04:47PM +0000, Joe Damato wrote:
> In commit 46b5e98ef6f1 ("x86: Add seperate non-temporal tunable for
> memset") a tunable threshold for enabling non-temporal memset was added,
> but only for Intel hardware.
> 
> Since that commit, new benchmark results suggest that non-temporal
> memset is beneficial on AMD, as well, so allow this tunable to be set
> for AMD.
> 
> See:
> https://docs.google.com/spreadsheets/d/1opzukzvum4n6-RUVHTGddV6RjAEil4P2uMjjQGLbLcU/edit?usp=sharing

Say, is there some docs somewhere explaining how those benchmarks are run so
that I can do them myself?

Please CC me directly as I'm not subscribed to the libc ML.

Thx.
diff mbox series

Patch

diff --git a/sysdeps/x86/dl-cacheinfo.h b/sysdeps/x86/dl-cacheinfo.h
index d375a7cba6..d2fe61b997 100644
--- a/sysdeps/x86/dl-cacheinfo.h
+++ b/sysdeps/x86/dl-cacheinfo.h
@@ -986,11 +986,11 @@  dl_init_cacheinfo (struct cpu_features *cpu_features)
   if (CPU_FEATURE_USABLE_P (cpu_features, FSRM))
     rep_movsb_threshold = 2112;
 
-  /* Non-temporal stores in memset have only been tested on Intel hardware.
-     Until we benchmark data on other x86 processor, disable non-temporal
-     stores in memset. */
+  /* Non-temporal stores are more performant on Intel and AMD hardware above
+     non_temporal_threshold. Enable this for both Intel and AMD hardware. */
   unsigned long int memset_non_temporal_threshold = SIZE_MAX;
-  if (cpu_features->basic.kind == arch_kind_intel)
+  if (cpu_features->basic.kind == arch_kind_intel
+      || cpu_features->basic.kind == arch_kind_amd)
       memset_non_temporal_threshold = non_temporal_threshold;
 
    /* For AMD CPUs that support ERMS (Zen3+), REP MOVSB is in a lot of