Message ID | 20240607230447.52478-1-jdamato@fastly.com |
---|---|
State | New |
Headers | show |
Series | x86: Enable non-temporal memset tunable for AMD | expand |
On Fri, Jun 7, 2024 at 6:04 PM Joe Damato <jdamato@fastly.com> wrote: > > In commit 46b5e98ef6f1 ("x86: Add seperate non-temporal tunable for > memset") a tunable threshold for enabling non-temporal memset was added, > but only for Intel hardware. > > Since that commit, new benchmark results suggest that non-temporal > memset is beneficial on AMD, as well, so allow this tunable to be set > for AMD. > > See: > https://docs.google.com/spreadsheets/d/1opzukzvum4n6-RUVHTGddV6RjAEil4P2uMjjQGLbLcU/edit?usp=sharing > which has been updated to include data using different stategies for > large memset on AMD Zen2, Zen3, and Zen4. > > Signed-off-by: Joe Damato <jdamato@fastly.com> > --- > sysdeps/x86/dl-cacheinfo.h | 8 ++++---- > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/sysdeps/x86/dl-cacheinfo.h b/sysdeps/x86/dl-cacheinfo.h > index d375a7cba6..d2fe61b997 100644 > --- a/sysdeps/x86/dl-cacheinfo.h > +++ b/sysdeps/x86/dl-cacheinfo.h > @@ -986,11 +986,11 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) > if (CPU_FEATURE_USABLE_P (cpu_features, FSRM)) > rep_movsb_threshold = 2112; > > - /* Non-temporal stores in memset have only been tested on Intel hardware. > - Until we benchmark data on other x86 processor, disable non-temporal > - stores in memset. */ > + /* Non-temporal stores are more performant on Intel and AMD hardware above > + non_temporal_threshold. Enable this for both Intel and AMD hardware. */ > unsigned long int memset_non_temporal_threshold = SIZE_MAX; > - if (cpu_features->basic.kind == arch_kind_intel) > + if (cpu_features->basic.kind == arch_kind_intel > + || cpu_features->basic.kind == arch_kind_amd) > memset_non_temporal_threshold = non_temporal_threshold; > > /* For AMD CPUs that support ERMS (Zen3+), REP MOVSB is in a lot of > -- > 2.25.1 > LGTM. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
On 6/8/24 3:10 PM, Noah Goldstein wrote: > On Fri, Jun 7, 2024 at 6:04 PM Joe Damato <jdamato@fastly.com> wrote: >> >> In commit 46b5e98ef6f1 ("x86: Add seperate non-temporal tunable for >> memset") a tunable threshold for enabling non-temporal memset was added, >> but only for Intel hardware. >> >> Since that commit, new benchmark results suggest that non-temporal >> memset is beneficial on AMD, as well, so allow this tunable to be set >> for AMD. >> >> See: >> https://docs.google.com/spreadsheets/d/1opzukzvum4n6-RUVHTGddV6RjAEil4P2uMjjQGLbLcU/edit?usp=sharing >> which has been updated to include data using different stategies for >> large memset on AMD Zen2, Zen3, and Zen4. >> >> Signed-off-by: Joe Damato <jdamato@fastly.com> >> --- >> sysdeps/x86/dl-cacheinfo.h | 8 ++++---- >> 1 file changed, 4 insertions(+), 4 deletions(-) >> >> diff --git a/sysdeps/x86/dl-cacheinfo.h b/sysdeps/x86/dl-cacheinfo.h >> index d375a7cba6..d2fe61b997 100644 >> --- a/sysdeps/x86/dl-cacheinfo.h >> +++ b/sysdeps/x86/dl-cacheinfo.h >> @@ -986,11 +986,11 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) >> if (CPU_FEATURE_USABLE_P (cpu_features, FSRM)) >> rep_movsb_threshold = 2112; >> >> - /* Non-temporal stores in memset have only been tested on Intel hardware. >> - Until we benchmark data on other x86 processor, disable non-temporal >> - stores in memset. */ >> + /* Non-temporal stores are more performant on Intel and AMD hardware above >> + non_temporal_threshold. Enable this for both Intel and AMD hardware. */ >> unsigned long int memset_non_temporal_threshold = SIZE_MAX; >> - if (cpu_features->basic.kind == arch_kind_intel) >> + if (cpu_features->basic.kind == arch_kind_intel >> + || cpu_features->basic.kind == arch_kind_amd) >> memset_non_temporal_threshold = non_temporal_threshold; >> >> /* For AMD CPUs that support ERMS (Zen3+), REP MOVSB is in a lot of >> -- >> 2.25.1 >> > > LGTM. > > Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com> > Noah, This passes all CI/CD, please feel free to push this with your RB line added.
On Fri, Jun 07, 2024 at 11:04:47PM +0000, Joe Damato wrote: > In commit 46b5e98ef6f1 ("x86: Add seperate non-temporal tunable for > memset") a tunable threshold for enabling non-temporal memset was added, > but only for Intel hardware. > > Since that commit, new benchmark results suggest that non-temporal > memset is beneficial on AMD, as well, so allow this tunable to be set > for AMD. > > See: > https://docs.google.com/spreadsheets/d/1opzukzvum4n6-RUVHTGddV6RjAEil4P2uMjjQGLbLcU/edit?usp=sharing Say, is there some docs somewhere explaining how those benchmarks are run so that I can do them myself? Please CC me directly as I'm not subscribed to the libc ML. Thx.
diff --git a/sysdeps/x86/dl-cacheinfo.h b/sysdeps/x86/dl-cacheinfo.h index d375a7cba6..d2fe61b997 100644 --- a/sysdeps/x86/dl-cacheinfo.h +++ b/sysdeps/x86/dl-cacheinfo.h @@ -986,11 +986,11 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) if (CPU_FEATURE_USABLE_P (cpu_features, FSRM)) rep_movsb_threshold = 2112; - /* Non-temporal stores in memset have only been tested on Intel hardware. - Until we benchmark data on other x86 processor, disable non-temporal - stores in memset. */ + /* Non-temporal stores are more performant on Intel and AMD hardware above + non_temporal_threshold. Enable this for both Intel and AMD hardware. */ unsigned long int memset_non_temporal_threshold = SIZE_MAX; - if (cpu_features->basic.kind == arch_kind_intel) + if (cpu_features->basic.kind == arch_kind_intel + || cpu_features->basic.kind == arch_kind_amd) memset_non_temporal_threshold = non_temporal_threshold; /* For AMD CPUs that support ERMS (Zen3+), REP MOVSB is in a lot of
In commit 46b5e98ef6f1 ("x86: Add seperate non-temporal tunable for memset") a tunable threshold for enabling non-temporal memset was added, but only for Intel hardware. Since that commit, new benchmark results suggest that non-temporal memset is beneficial on AMD, as well, so allow this tunable to be set for AMD. See: https://docs.google.com/spreadsheets/d/1opzukzvum4n6-RUVHTGddV6RjAEil4P2uMjjQGLbLcU/edit?usp=sharing which has been updated to include data using different stategies for large memset on AMD Zen2, Zen3, and Zen4. Signed-off-by: Joe Damato <jdamato@fastly.com> --- sysdeps/x86/dl-cacheinfo.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)