Message ID | 20240626024649.3689-1-MayShao-oc@zhaoxin.com |
---|---|
State | New |
Headers | show |
Series | [1/3] x86:Set preferred CPU features on the KH-40000 and KX-7000 Zhaoxin processors | expand |
On Wed, 2024-06-26 at 10:46 +0800, MayShao wrote: > From: MayShao <mayshao-oc@zhaoxin.com> > > Fix code indentation issues under the Zhaoxin branch. > > Unaligned AVX load are slower on KH-40000 and KX-7000, so disable > the AVX_Fast_Unaligned_Load. > > Enable Prefer_No_VZEROUPPER and Fast_Unaligned_Load features to > use sse2_unaligned version of memset,strcpy and strcat. > --- > sysdeps/x86/cpu-features.c | 66 ++++++++++++++++++++++--------------- > - > 1 file changed, 39 insertions(+), 27 deletions(-) > > diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c > index 3d7c2819d7..24fbf699b9 100644 > --- a/sysdeps/x86/cpu-features.c > +++ b/sysdeps/x86/cpu-features.c > @@ -1015,7 +1015,7 @@ > https://www.intel.com/content/www/us/en/support/articles/000059422/processors.ht > kind = arch_kind_zhaoxin; > > get_common_indices (cpu_features, &family, &model, > &extended_model, > - &stepping); > + &stepping); Don't randomly change tabs to spaces.
On Wed, Jun 26, 2024 at 10:56 AM Xi Ruoyao <xry111@xry111.site> wrote: > > On Wed, 2024-06-26 at 10:46 +0800, MayShao wrote: > > From: MayShao <mayshao-oc@zhaoxin.com> > > > > Fix code indentation issues under the Zhaoxin branch. > > > > Unaligned AVX load are slower on KH-40000 and KX-7000, so disable > > the AVX_Fast_Unaligned_Load. > > > > Enable Prefer_No_VZEROUPPER and Fast_Unaligned_Load features to > > use sse2_unaligned version of memset,strcpy and strcat. > > --- > > sysdeps/x86/cpu-features.c | 66 ++++++++++++++++++++++--------------- > > - > > 1 file changed, 39 insertions(+), 27 deletions(-) > > > > diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c > > index 3d7c2819d7..24fbf699b9 100644 > > --- a/sysdeps/x86/cpu-features.c > > +++ b/sysdeps/x86/cpu-features.c > > @@ -1015,7 +1015,7 @@ > > https://www.intel.com/content/www/us/en/support/articles/000059422/processors.ht > > kind = arch_kind_zhaoxin; > > > > get_common_indices (cpu_features, &family, &model, > > &extended_model, > > - &stepping); > > + &stepping); > > Don't randomly change tabs to spaces. We have a clang-format file, you should run it over the entire patch. Also, can you comment or enum the different models? > > -- > Xi Ruoyao <xry111@xry111.site> > School of Aerospace Science and Technology, Xidian University
On Wen, Jun 26, 2024 at 11:26 AM Noah Goldstein <goldstein.w.n@gmail.com> wrote: > > On Wed, Jun 26, 2024 at 10:56 AM Xi Ruoyao <xry111@xry111.site> wrote: > > > > On Wed, 2024-06-26 at 10:46 +0800, MayShao wrote: > > > From: MayShao <mayshao-oc@zhaoxin.com> > > > > > > Fix code indentation issues under the Zhaoxin branch. > > > > > > Unaligned AVX load are slower on KH-40000 and KX-7000, so disable > > > the AVX_Fast_Unaligned_Load. > > > > > > Enable Prefer_No_VZEROUPPER and Fast_Unaligned_Load features to > > > use sse2_unaligned version of memset,strcpy and strcat. > > > --- > > > sysdeps/x86/cpu-features.c | 66 ++++++++++++++++++++++--------------- > > > - > > > 1 file changed, 39 insertions(+), 27 deletions(-) > > > > > > diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c > > > index 3d7c2819d7..24fbf699b9 100644 > > > --- a/sysdeps/x86/cpu-features.c > > > +++ b/sysdeps/x86/cpu-features.c > > > @@ -1015,7 +1015,7 @@ > > > https://www.intel.com/content/www/us/en/support/articles/000059422/processors.ht > > > kind = arch_kind_zhaoxin; > > > > > > get_common_indices (cpu_features, &family, &model, > > > &extended_model, > > > - &stepping); > > > + &stepping); > > > > Don't randomly change tabs to spaces. > > We have a clang-format file, you should run it over the entire patch. > Ack. I will fix this. > Also, can you comment or enum the different models? Will fix it. > > > > -- > > Xi Ruoyao <xry111@xry111.site> > > School of Aerospace Science and Technology, Xidian University> >
* MayShao: > From: MayShao <mayshao-oc@zhaoxin.com> > > Fix code indentation issues under the Zhaoxin branch. > > Unaligned AVX load are slower on KH-40000 and KX-7000, so disable > the AVX_Fast_Unaligned_Load. > > Enable Prefer_No_VZEROUPPER and Fast_Unaligned_Load features to > use sse2_unaligned version of memset,strcpy and strcat. Somewhat related to that, do you have documentation of the behavior of *aligned* 128-bit loads? Are they guaranteed to be atomic? At least if MOVAPD, MOVAPS, MOVDQA are used? Thanks, Florian
On Wen, Jun 16,2024 7:01 PM Florian Weimer <fweimer@redhat.com> wrote: > > * MayShao: > > > From: MayShao <mayshao-oc@zhaoxin.com> > > > > Fix code indentation issues under the Zhaoxin branch. > > > > Unaligned AVX load are slower on KH-40000 and KX-7000, so disable > > the AVX_Fast_Unaligned_Load. > > > > Enable Prefer_No_VZEROUPPER and Fast_Unaligned_Load features to > > use sse2_unaligned version of memset,strcpy and strcat. > > Somewhat related to that, do you have documentation of the behavior of > *aligned* 128-bit loads? Are they guaranteed to be atomic? > At least if MOVAPD, MOVAPS, MOVDQA are used? I can confirm is that aligned 128-bit loads (such as MOVAPD, MOVAPS, MOVDQA) in the WB memory region are atomic, and for unaligned 128-bit loads, it can also be guaranteed to be atomic if within a cacheline. > > Thanks, > Florian >
> On Wen, Jun 16,2024 7:01 PM Florian Weimer <fweimer@redhat.com> wrote: >> >> * MayShao: >> >> > From: MayShao <mayshao-oc@zhaoxin.com> >> > >> > Fix code indentation issues under the Zhaoxin branch. >> > >> > Unaligned AVX load are slower on KH-40000 and KX-7000, so disable >> > the AVX_Fast_Unaligned_Load. >> > >> > Enable Prefer_No_VZEROUPPER and Fast_Unaligned_Load features to >> > use sse2_unaligned version of memset,strcpy and strcat. >> >> Somewhat related to that, do you have documentation of the behavior of >> *aligned* 128-bit loads? Are they guaranteed to be atomic? >> At least if MOVAPD, MOVAPS, MOVDQA are used? > > I can confirm is that aligned 128-bit loads (such as MOVAPD, MOVAPS, > MOVDQA) in the WB memory region are atomic, and for unaligned > 128-bit loads, it can also be guaranteed to be atomic if within a cacheline. This is great news. Could you update this GCC bug with the information? Bug 104688 - gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688> I think it means we can teach GCC to use 128-bit atomic loads unconditionally for AVX targets (bypassing libatomic). Thanks, Florian
On Thu, 2024-06-27 at 08:32 +0200, Florian Weimer wrote: > > On Wen, Jun 16,2024 7:01 PM Florian Weimer <fweimer@redhat.com> > > wrote: > > > > > > * MayShao: > > > > > > > From: MayShao <mayshao-oc@zhaoxin.com> > > > > > > > > Fix code indentation issues under the Zhaoxin branch. > > > > > > > > Unaligned AVX load are slower on KH-40000 and KX-7000, so > > > > disable > > > > the AVX_Fast_Unaligned_Load. > > > > > > > > Enable Prefer_No_VZEROUPPER and Fast_Unaligned_Load features to > > > > use sse2_unaligned version of memset,strcpy and strcat. > > > > > > Somewhat related to that, do you have documentation of the > > > behavior of > > > *aligned* 128-bit loads? Are they guaranteed to be atomic? > > > At least if MOVAPD, MOVAPS, MOVDQA are used? > > > > I can confirm is that aligned 128-bit loads (such as MOVAPD, MOVAPS, > > MOVDQA) in the WB memory region are atomic, and for unaligned > > 128-bit loads, it can also be guaranteed to be atomic if within a > > cacheline. > > This is great news. Could you update this GCC bug with the > information? > > Bug 104688 - gcc and libatomic can use SSE for 128-bit atomic loads > on Intel and AMD CPUs with AVX > <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688> The info is already there. But the discussion stalled after some concern about non-Write-Back memory regions was raised.
diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c index 3d7c2819d7..24fbf699b9 100644 --- a/sysdeps/x86/cpu-features.c +++ b/sysdeps/x86/cpu-features.c @@ -1015,7 +1015,7 @@ https://www.intel.com/content/www/us/en/support/articles/000059422/processors.ht kind = arch_kind_zhaoxin; get_common_indices (cpu_features, &family, &model, &extended_model, - &stepping); + &stepping); get_extended_indices (cpu_features); @@ -1026,38 +1026,50 @@ https://www.intel.com/content/www/us/en/support/articles/000059422/processors.ht { if (model == 0xf || model == 0x19) { - CPU_FEATURE_UNSET (cpu_features, AVX); - CPU_FEATURE_UNSET (cpu_features, AVX2); + CPU_FEATURE_UNSET (cpu_features, AVX); + CPU_FEATURE_UNSET (cpu_features, AVX2); - cpu_features->preferred[index_arch_Slow_SSE4_2] - |= bit_arch_Slow_SSE4_2; + cpu_features->preferred[index_arch_Slow_SSE4_2] + |= bit_arch_Slow_SSE4_2; - cpu_features->preferred[index_arch_AVX_Fast_Unaligned_Load] - &= ~bit_arch_AVX_Fast_Unaligned_Load; + cpu_features->preferred[index_arch_AVX_Fast_Unaligned_Load] + &= ~bit_arch_AVX_Fast_Unaligned_Load; } } else if (family == 0x7) { - if (model == 0x1b) - { - CPU_FEATURE_UNSET (cpu_features, AVX); - CPU_FEATURE_UNSET (cpu_features, AVX2); - - cpu_features->preferred[index_arch_Slow_SSE4_2] - |= bit_arch_Slow_SSE4_2; - - cpu_features->preferred[index_arch_AVX_Fast_Unaligned_Load] - &= ~bit_arch_AVX_Fast_Unaligned_Load; - } - else if (model == 0x3b) - { - CPU_FEATURE_UNSET (cpu_features, AVX); - CPU_FEATURE_UNSET (cpu_features, AVX2); - - cpu_features->preferred[index_arch_AVX_Fast_Unaligned_Load] - &= ~bit_arch_AVX_Fast_Unaligned_Load; - } - } + switch (model) + { + case 0x1b: + CPU_FEATURE_UNSET (cpu_features, AVX); + CPU_FEATURE_UNSET (cpu_features, AVX2); + + cpu_features->preferred[index_arch_Slow_SSE4_2] + |= bit_arch_Slow_SSE4_2; + + cpu_features->preferred[index_arch_AVX_Fast_Unaligned_Load] + &= ~bit_arch_AVX_Fast_Unaligned_Load; + break; + + case 0x3b: + CPU_FEATURE_UNSET (cpu_features, AVX); + CPU_FEATURE_UNSET (cpu_features, AVX2); + + cpu_features->preferred[index_arch_AVX_Fast_Unaligned_Load] + &= ~bit_arch_AVX_Fast_Unaligned_Load; + break; + + case 0x5b: + case 0x6b: + cpu_features->preferred[index_arch_AVX_Fast_Unaligned_Load] + &= ~bit_arch_AVX_Fast_Unaligned_Load; + + cpu_features->preferred[index_arch_Prefer_No_VZEROUPPER] + |= (bit_arch_Prefer_No_VZEROUPPER + | bit_arch_Fast_Unaligned_Load); + break; + } + } } else {
From: MayShao <mayshao-oc@zhaoxin.com> Fix code indentation issues under the Zhaoxin branch. Unaligned AVX load are slower on KH-40000 and KX-7000, so disable the AVX_Fast_Unaligned_Load. Enable Prefer_No_VZEROUPPER and Fast_Unaligned_Load features to use sse2_unaligned version of memset,strcpy and strcat. --- sysdeps/x86/cpu-features.c | 66 ++++++++++++++++++++++---------------- 1 file changed, 39 insertions(+), 27 deletions(-)