Message ID | 20230509031020.3496291-1-goldstein.w.n@gmail.com |
---|---|
State | New |
Headers | show |
Series | [v1] x86: Use 64MB as nt-store threshold if no cacheinfo [BZ #30429] | expand |
* Noah Goldstein via Libc-alpha: > If `non_temporal_threshold` is below `minimum_non_temporal_threshold`, > it almost certainly means we failed to read the systems cache info. > > In this case, rather than defaulting the minimum correct value, we > should default to a value that gets at least reasonable > performance. 64MB is chosen conservatively to be at the very high > end. This should never cause non-temporal stores when, if we had read > cache info, we wouldn't have otherwise. I think that's quite surprising for GLIBC_TUNABLES. Maybe that logic should only activate if the default was set from cache sizes? Thanks, Florian
On Tue, May 9, 2023 at 3:56 AM Florian Weimer <fweimer@redhat.com> wrote: > > * Noah Goldstein via Libc-alpha: > > > If `non_temporal_threshold` is below `minimum_non_temporal_threshold`, > > it almost certainly means we failed to read the systems cache info. > > > > In this case, rather than defaulting the minimum correct value, we > > should default to a value that gets at least reasonable > > performance. 64MB is chosen conservatively to be at the very high > > end. This should never cause non-temporal stores when, if we had read > > cache info, we wouldn't have otherwise. > > I think that's quite surprising for GLIBC_TUNABLES. Maybe that logic > should only activate if the default was set from cache sizes? > I don't quite understand what you mean by "only active if the default was set from cache sizes"? This logic only triggers if total_cachesize / 8 < ~16kb. I think this should only ever really happen if we failed to read cache info. What is the surprise? > Thanks, > Florian >
On Tue, May 9, 2023 at 11:01 AM Noah Goldstein <goldstein.w.n@gmail.com> wrote: > > On Tue, May 9, 2023 at 3:56 AM Florian Weimer <fweimer@redhat.com> wrote: > > > > * Noah Goldstein via Libc-alpha: > > > > > If `non_temporal_threshold` is below `minimum_non_temporal_threshold`, > > > it almost certainly means we failed to read the systems cache info. > > > > > > In this case, rather than defaulting the minimum correct value, we > > > should default to a value that gets at least reasonable > > > performance. 64MB is chosen conservatively to be at the very high > > > end. This should never cause non-temporal stores when, if we had read > > > cache info, we wouldn't have otherwise. > > > > I think that's quite surprising for GLIBC_TUNABLES. Maybe that logic > > should only activate if the default was set from cache sizes? > > > I don't quite understand what you mean by "only active if the default > was set from cache sizes"? > > This logic only triggers if total_cachesize / 8 < ~16kb. I think this should > only ever really happen if we failed to read cache info. What is the > surprise? florian? > > > Thanks, > > Florian > >
* Noah Goldstein: > On Tue, May 9, 2023 at 3:56 AM Florian Weimer <fweimer@redhat.com> wrote: >> >> * Noah Goldstein via Libc-alpha: >> >> > If `non_temporal_threshold` is below `minimum_non_temporal_threshold`, >> > it almost certainly means we failed to read the systems cache info. >> > >> > In this case, rather than defaulting the minimum correct value, we >> > should default to a value that gets at least reasonable >> > performance. 64MB is chosen conservatively to be at the very high >> > end. This should never cause non-temporal stores when, if we had read >> > cache info, we wouldn't have otherwise. >> >> I think that's quite surprising for GLIBC_TUNABLES. Maybe that logic >> should only activate if the default was set from cache sizes? >> > I don't quite understand what you mean by "only active if the default > was set from cache sizes"? > > This logic only triggers if total_cachesize / 8 < ~16kb. I think this should > only ever really happen if we failed to read cache info. What is the > surprise? Right, my mistake. The patch should work as-is. Reviewed-by: Florian Weimer <fweimer@redhat.com> Thanks, Florian
On Mon, May 8, 2023 at 8:10 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote: > > If `non_temporal_threshold` is below `minimum_non_temporal_threshold`, > it almost certainly means we failed to read the systems cache info. > > In this case, rather than defaulting the minimum correct value, we > should default to a value that gets at least reasonable > performance. 64MB is chosen conservatively to be at the very high > end. This should never cause non-temporal stores when, if we had read > cache info, we wouldn't have otherwise. > --- > sysdeps/x86/dl-cacheinfo.h | 10 +++++++++- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/sysdeps/x86/dl-cacheinfo.h b/sysdeps/x86/dl-cacheinfo.h > index 864b00a521..6225c852f6 100644 > --- a/sysdeps/x86/dl-cacheinfo.h > +++ b/sysdeps/x86/dl-cacheinfo.h > @@ -771,8 +771,16 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) > reflected in the manual. */ > unsigned long int maximum_non_temporal_threshold = SIZE_MAX >> 4; > unsigned long int minimum_non_temporal_threshold = 0x4040; > + > + /* If `non_temporal_threshold` less than `minimum_non_temporal_threshold` > + it most likely means we failed to detect the cache info. We don't want > + to default to `minimum_non_temporal_threshold` as such a small value, > + while correct, has bad performance. We default to 64MB as reasonable > + default bound. 64MB is likely conservative in that most/all systems would > + choose a lower value so it should never forcing non-temporal stores when > + they otherwise wouldn't be used. */ > if (non_temporal_threshold < minimum_non_temporal_threshold) > - non_temporal_threshold = minimum_non_temporal_threshold; > + non_temporal_threshold = 64 * 1024 * 1024; > else if (non_temporal_threshold > maximum_non_temporal_threshold) > non_temporal_threshold = maximum_non_temporal_threshold; > > -- > 2.34.1 > LGTM. Thanks.
On Mon, May 8, 2023 at 10:10 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote: > > If `non_temporal_threshold` is below `minimum_non_temporal_threshold`, > it almost certainly means we failed to read the systems cache info. > > In this case, rather than defaulting the minimum correct value, we > should default to a value that gets at least reasonable > performance. 64MB is chosen conservatively to be at the very high > end. This should never cause non-temporal stores when, if we had read > cache info, we wouldn't have otherwise. > --- > sysdeps/x86/dl-cacheinfo.h | 10 +++++++++- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/sysdeps/x86/dl-cacheinfo.h b/sysdeps/x86/dl-cacheinfo.h > index 864b00a521..6225c852f6 100644 > --- a/sysdeps/x86/dl-cacheinfo.h > +++ b/sysdeps/x86/dl-cacheinfo.h > @@ -771,8 +771,16 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) > reflected in the manual. */ > unsigned long int maximum_non_temporal_threshold = SIZE_MAX >> 4; > unsigned long int minimum_non_temporal_threshold = 0x4040; > + > + /* If `non_temporal_threshold` less than `minimum_non_temporal_threshold` > + it most likely means we failed to detect the cache info. We don't want > + to default to `minimum_non_temporal_threshold` as such a small value, > + while correct, has bad performance. We default to 64MB as reasonable > + default bound. 64MB is likely conservative in that most/all systems would > + choose a lower value so it should never forcing non-temporal stores when > + they otherwise wouldn't be used. */ > if (non_temporal_threshold < minimum_non_temporal_threshold) > - non_temporal_threshold = minimum_non_temporal_threshold; > + non_temporal_threshold = 64 * 1024 * 1024; > else if (non_temporal_threshold > maximum_non_temporal_threshold) > non_temporal_threshold = maximum_non_temporal_threshold; > > -- > 2.34.1 > I want to backport down to 2.28. Thoughts?
On Mon, Jun 5, 2023 at 10:15 AM Noah Goldstein <goldstein.w.n@gmail.com> wrote: > > On Mon, May 8, 2023 at 10:10 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote: > > > > If `non_temporal_threshold` is below `minimum_non_temporal_threshold`, > > it almost certainly means we failed to read the systems cache info. > > > > In this case, rather than defaulting the minimum correct value, we > > should default to a value that gets at least reasonable > > performance. 64MB is chosen conservatively to be at the very high > > end. This should never cause non-temporal stores when, if we had read > > cache info, we wouldn't have otherwise. > > --- > > sysdeps/x86/dl-cacheinfo.h | 10 +++++++++- > > 1 file changed, 9 insertions(+), 1 deletion(-) > > > > diff --git a/sysdeps/x86/dl-cacheinfo.h b/sysdeps/x86/dl-cacheinfo.h > > index 864b00a521..6225c852f6 100644 > > --- a/sysdeps/x86/dl-cacheinfo.h > > +++ b/sysdeps/x86/dl-cacheinfo.h > > @@ -771,8 +771,16 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) > > reflected in the manual. */ > > unsigned long int maximum_non_temporal_threshold = SIZE_MAX >> 4; > > unsigned long int minimum_non_temporal_threshold = 0x4040; > > + > > + /* If `non_temporal_threshold` less than `minimum_non_temporal_threshold` > > + it most likely means we failed to detect the cache info. We don't want > > + to default to `minimum_non_temporal_threshold` as such a small value, > > + while correct, has bad performance. We default to 64MB as reasonable > > + default bound. 64MB is likely conservative in that most/all systems would > > + choose a lower value so it should never forcing non-temporal stores when > > + they otherwise wouldn't be used. */ > > if (non_temporal_threshold < minimum_non_temporal_threshold) > > - non_temporal_threshold = minimum_non_temporal_threshold; > > + non_temporal_threshold = 64 * 1024 * 1024; > > else if (non_temporal_threshold > maximum_non_temporal_threshold) > > non_temporal_threshold = maximum_non_temporal_threshold; > > > > -- > > 2.34.1 > > > > I want to backport down to 2.28. > Thoughts? Who will use such backport?
On Mon, Jun 5, 2023 at 12:46 PM H.J. Lu <hjl.tools@gmail.com> wrote: > > On Mon, Jun 5, 2023 at 10:15 AM Noah Goldstein <goldstein.w.n@gmail.com> wrote: > > > > On Mon, May 8, 2023 at 10:10 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote: > > > > > > If `non_temporal_threshold` is below `minimum_non_temporal_threshold`, > > > it almost certainly means we failed to read the systems cache info. > > > > > > In this case, rather than defaulting the minimum correct value, we > > > should default to a value that gets at least reasonable > > > performance. 64MB is chosen conservatively to be at the very high > > > end. This should never cause non-temporal stores when, if we had read > > > cache info, we wouldn't have otherwise. > > > --- > > > sysdeps/x86/dl-cacheinfo.h | 10 +++++++++- > > > 1 file changed, 9 insertions(+), 1 deletion(-) > > > > > > diff --git a/sysdeps/x86/dl-cacheinfo.h b/sysdeps/x86/dl-cacheinfo.h > > > index 864b00a521..6225c852f6 100644 > > > --- a/sysdeps/x86/dl-cacheinfo.h > > > +++ b/sysdeps/x86/dl-cacheinfo.h > > > @@ -771,8 +771,16 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) > > > reflected in the manual. */ > > > unsigned long int maximum_non_temporal_threshold = SIZE_MAX >> 4; > > > unsigned long int minimum_non_temporal_threshold = 0x4040; > > > + > > > + /* If `non_temporal_threshold` less than `minimum_non_temporal_threshold` > > > + it most likely means we failed to detect the cache info. We don't want > > > + to default to `minimum_non_temporal_threshold` as such a small value, > > > + while correct, has bad performance. We default to 64MB as reasonable > > > + default bound. 64MB is likely conservative in that most/all systems would > > > + choose a lower value so it should never forcing non-temporal stores when > > > + they otherwise wouldn't be used. */ > > > if (non_temporal_threshold < minimum_non_temporal_threshold) > > > - non_temporal_threshold = minimum_non_temporal_threshold; > > > + non_temporal_threshold = 64 * 1024 * 1024; > > > else if (non_temporal_threshold > maximum_non_temporal_threshold) > > > non_temporal_threshold = maximum_non_temporal_threshold; > > > > > > -- > > > 2.34.1 > > > > > > > I want to backport down to 2.28. > > Thoughts? > > Who will use such backport? People using systems where we miscalculate cache size. > > -- > H.J.
diff --git a/sysdeps/x86/dl-cacheinfo.h b/sysdeps/x86/dl-cacheinfo.h index 864b00a521..6225c852f6 100644 --- a/sysdeps/x86/dl-cacheinfo.h +++ b/sysdeps/x86/dl-cacheinfo.h @@ -771,8 +771,16 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) reflected in the manual. */ unsigned long int maximum_non_temporal_threshold = SIZE_MAX >> 4; unsigned long int minimum_non_temporal_threshold = 0x4040; + + /* If `non_temporal_threshold` less than `minimum_non_temporal_threshold` + it most likely means we failed to detect the cache info. We don't want + to default to `minimum_non_temporal_threshold` as such a small value, + while correct, has bad performance. We default to 64MB as reasonable + default bound. 64MB is likely conservative in that most/all systems would + choose a lower value so it should never forcing non-temporal stores when + they otherwise wouldn't be used. */ if (non_temporal_threshold < minimum_non_temporal_threshold) - non_temporal_threshold = minimum_non_temporal_threshold; + non_temporal_threshold = 64 * 1024 * 1024; else if (non_temporal_threshold > maximum_non_temporal_threshold) non_temporal_threshold = maximum_non_temporal_threshold;