Message ID | 1723445305-99403-4-git-send-email-wangfeifei@hygon.cn |
---|---|
State | New |
Headers | show |
Series | x86: Add support for Hygon processors | expand |
On 12/08/24 03:48, Feifei Wang wrote: > This patch is based on the following new flag patch: > https://patchwork.sourceware.org/project/glibc/patch/20240811055619.2863839-1-goldstein.w.n@gmail.com/ > This patch fails to build for 32-bit: https://www.delorie.com/trybots/32bit/37310/make.tail.txt > After the new cpu-flag 'Prefer_Non_Temporal' is added in glibc, > this patch can be enabled to access the non-temporal memset > implementation for hygon processors. > > Test Results: > thread: 1 > memset store value: 0 > > hygon1 arch > x86_memset_non_temporal_threshold = 8MB > size new performance / old performance > 128 byte(2x -4x vec case) 1 > 256 byte(4x - 8x vec case) 1 > 512 byte( > 8x loop case) 1 > 1MB 0.994 > 4MB 0.996 > 8MB 0.670 > 16MB 0.343 > 32MB 0.355 > > hygon2 arch > x86_memset_non_temporal_threshold = 8MB > size new performance / old performance > 128 byte(2x -4x vec case) 1 > 256 byte(4x - 8x vec case) 0.653 > 512 byte( > 8x loop case) 0.713 > 1MB 1 > 4MB 0.887 > 8MB 1.312 > 16MB 0.822 > 32MB 0.830 > > hygon3 arch > x86_memset_non_temporal_threshold = 8MB > size new performance / old performance > 128 byte(2x -4x vec case) 1 > 256 byte(4x - 8x vec case) 1 > 512 byte( > 8x loop case) 1 > 1MB 1 > 4MB 0.990 > 8MB 0.737 > 16MB 0.390 > 32MB 0.401 > > For hygon arch with this patch, no performance degradation on '2x - 8x branch case' > when extra branch jump added. And with this patch, non-temporal stores can improve > performance by 20% - 65%. > > Signed-off-by: Feifei Wang <wangfeifei@hygon.cn> > Reviewed-by: Jing Li <lijing@hygon.cn> > --- > sysdeps/x86/cpu-features.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c > index 034dc28f64..cae26babc7 100644 > --- a/sysdeps/x86/cpu-features.c > +++ b/sysdeps/x86/cpu-features.c > @@ -1098,6 +1098,12 @@ https://www.intel.com/content/www/us/en/support/articles/000059422/processors.ht > get_extended_indices (cpu_features); > > update_active (cpu_features); > + > + /* Use Prefer_Non_Temporal flag to access the non-temporal > + memset implementation due to ERMS is disable in Hygon > + processors. */ > + cpu_features->preferred[index_arch_Prefer_Non_Temporal] > + |= (bit_arch_Prefer_Non_Temporal); > } > else > {
On Sun, Aug 11, 2024 at 11:49 PM Feifei Wang <wangfeifei@hygon.cn> wrote: > > This patch is based on the following new flag patch: > https://patchwork.sourceware.org/project/glibc/patch/20240811055619.2863839-1-goldstein.w.n@gmail.com/ Please wait until the above patch has been reviewed and committed. > After the new cpu-flag 'Prefer_Non_Temporal' is added in glibc, > this patch can be enabled to access the non-temporal memset > implementation for hygon processors. > > Test Results: > thread: 1 > memset store value: 0 > > hygon1 arch > x86_memset_non_temporal_threshold = 8MB > size new performance / old performance > 128 byte(2x -4x vec case) 1 > 256 byte(4x - 8x vec case) 1 > 512 byte( > 8x loop case) 1 > 1MB 0.994 > 4MB 0.996 > 8MB 0.670 > 16MB 0.343 > 32MB 0.355 > > hygon2 arch > x86_memset_non_temporal_threshold = 8MB > size new performance / old performance > 128 byte(2x -4x vec case) 1 > 256 byte(4x - 8x vec case) 0.653 > 512 byte( > 8x loop case) 0.713 > 1MB 1 > 4MB 0.887 > 8MB 1.312 > 16MB 0.822 > 32MB 0.830 > > hygon3 arch > x86_memset_non_temporal_threshold = 8MB > size new performance / old performance > 128 byte(2x -4x vec case) 1 > 256 byte(4x - 8x vec case) 1 > 512 byte( > 8x loop case) 1 > 1MB 1 > 4MB 0.990 > 8MB 0.737 > 16MB 0.390 > 32MB 0.401 > > For hygon arch with this patch, no performance degradation on '2x - 8x branch case' > when extra branch jump added. And with this patch, non-temporal stores can improve > performance by 20% - 65%. > > Signed-off-by: Feifei Wang <wangfeifei@hygon.cn> > Reviewed-by: Jing Li <lijing@hygon.cn> > --- > sysdeps/x86/cpu-features.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c > index 034dc28f64..cae26babc7 100644 > --- a/sysdeps/x86/cpu-features.c > +++ b/sysdeps/x86/cpu-features.c > @@ -1098,6 +1098,12 @@ https://www.intel.com/content/www/us/en/support/articles/000059422/processors.ht > get_extended_indices (cpu_features); > > update_active (cpu_features); > + > + /* Use Prefer_Non_Temporal flag to access the non-temporal > + memset implementation due to ERMS is disable in Hygon > + processors. */ > + cpu_features->preferred[index_arch_Prefer_Non_Temporal] > + |= (bit_arch_Prefer_Non_Temporal); > } > else > { > -- > 2.43.0 >
> -----邮件原件----- > 发件人: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> > 发送时间: 2024年8月12日 21:02 > 收件人: Feifei Wang <wangfeifei@hygon.cn>; libc-alpha@sourceware.org > 抄送: hjl.tools@gmail.com; carlos@redhat.com; fw@deneb.enyo.de; > goldstein.w.n@gmail.com; Jing Li <lijing@hygon.cn> > 主题: Re: [RFC PATCH 3/3] x86: Enable non-temporal memset for Hygon > processors > > > > On 12/08/24 03:48, Feifei Wang wrote: > > This patch is based on the following new flag patch: > > https://patchwork.sourceware.org/project/glibc/patch/20240811055619.28 > > 63839-1-goldstein.w.n@gmail.com/ > > > > This patch fails to build for 32-bit: > > https://www.delorie.com/trybots/32bit/37310/make.tail.txt This patch is based on the above new flag patch, after it is merged, this can be build Successfully, > > > After the new cpu-flag 'Prefer_Non_Temporal' is added in glibc, this > > patch can be enabled to access the non-temporal memset implementation > > for hygon processors. > > > > Test Results: > > thread: 1 > > memset store value: 0 > > > > hygon1 arch > > x86_memset_non_temporal_threshold = 8MB > > size new performance / old performance > > 128 byte(2x -4x vec case) 1 > > 256 byte(4x - 8x vec case) 1 > > 512 byte( > 8x loop case) 1 > > 1MB 0.994 > > 4MB 0.996 > > 8MB 0.670 > > 16MB 0.343 > > 32MB 0.355 > > > > hygon2 arch > > x86_memset_non_temporal_threshold = 8MB > > size new performance / old performance > > 128 byte(2x -4x vec case) 1 > > 256 byte(4x - 8x vec case) 0.653 > > 512 byte( > 8x loop case) 0.713 > > 1MB 1 > > 4MB 0.887 > > 8MB 1.312 > > 16MB 0.822 > > 32MB 0.830 > > > > hygon3 arch > > x86_memset_non_temporal_threshold = 8MB > > size new performance / old performance > > 128 byte(2x -4x vec case) 1 > > 256 byte(4x - 8x vec case) 1 > > 512 byte( > 8x loop case) 1 > > 1MB 1 > > 4MB 0.990 > > 8MB 0.737 > > 16MB 0.390 > > 32MB 0.401 > > > > For hygon arch with this patch, no performance degradation on '2x - 8x branch > case' > > when extra branch jump added. And with this patch, non-temporal stores > > can improve performance by 20% - 65%. > > > > Signed-off-by: Feifei Wang <wangfeifei@hygon.cn> > > Reviewed-by: Jing Li <lijing@hygon.cn> > > --- > > sysdeps/x86/cpu-features.c | 6 ++++++ > > 1 file changed, 6 insertions(+) > > > > diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c > > index 034dc28f64..cae26babc7 100644 > > --- a/sysdeps/x86/cpu-features.c > > +++ b/sysdeps/x86/cpu-features.c > > @@ -1098,6 +1098,12 @@ > https://www.intel.com/content/www/us/en/support/articles/000059422/proce > ssors.ht > > get_extended_indices (cpu_features); > > > > update_active (cpu_features); > > + > > + /* Use Prefer_Non_Temporal flag to access the non-temporal > > + memset implementation due to ERMS is disable in Hygon > > + processors. */ > > + cpu_features->preferred[index_arch_Prefer_Non_Temporal] > > + |= (bit_arch_Prefer_Non_Temporal); > > } > > else > > {
> -----邮件原件----- > 发件人: H.J. Lu <hjl.tools@gmail.com> > 发送时间: 2024年8月12日 21:12 > 收件人: Feifei Wang <wangfeifei@hygon.cn> > 抄送: libc-alpha@sourceware.org; carlos@redhat.com; fw@deneb.enyo.de; > goldstein.w.n@gmail.com; Jing Li <lijing@hygon.cn> > 主题: Re: [RFC PATCH 3/3] x86: Enable non-temporal memset for Hygon > processors > > On Sun, Aug 11, 2024 at 11:49 PM Feifei Wang <wangfeifei@hygon.cn> wrote: > > > > This patch is based on the following new flag patch: > > https://patchwork.sourceware.org/project/glibc/patch/20240811055619.28 > > 63839-1-goldstein.w.n@gmail.com/ > > Please wait until the above patch has been reviewed and committed. > That's fine. > > After the new cpu-flag 'Prefer_Non_Temporal' is added in glibc, this > > patch can be enabled to access the non-temporal memset implementation > > for hygon processors. > > > > Test Results: > > thread: 1 > > memset store value: 0 > > > > hygon1 arch > > x86_memset_non_temporal_threshold = 8MB > > size new performance / old performance > > 128 byte(2x -4x vec case) 1 > > 256 byte(4x - 8x vec case) 1 > > 512 byte( > 8x loop case) 1 > > 1MB 0.994 > > 4MB 0.996 > > 8MB 0.670 > > 16MB 0.343 > > 32MB 0.355 > > > > hygon2 arch > > x86_memset_non_temporal_threshold = 8MB > > size new performance / old performance > > 128 byte(2x -4x vec case) 1 > > 256 byte(4x - 8x vec case) 0.653 > > 512 byte( > 8x loop case) 0.713 > > 1MB 1 > > 4MB 0.887 > > 8MB 1.312 > > 16MB 0.822 > > 32MB 0.830 > > > > hygon3 arch > > x86_memset_non_temporal_threshold = 8MB > > size new performance / old performance > > 128 byte(2x -4x vec case) 1 > > 256 byte(4x - 8x vec case) 1 > > 512 byte( > 8x loop case) 1 > > 1MB 1 > > 4MB 0.990 > > 8MB 0.737 > > 16MB 0.390 > > 32MB 0.401 > > > > For hygon arch with this patch, no performance degradation on '2x - 8x branch > case' > > when extra branch jump added. And with this patch, non-temporal stores > > can improve performance by 20% - 65%. > > > > Signed-off-by: Feifei Wang <wangfeifei@hygon.cn> > > Reviewed-by: Jing Li <lijing@hygon.cn> > > --- > > sysdeps/x86/cpu-features.c | 6 ++++++ > > 1 file changed, 6 insertions(+) > > > > diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c > > index 034dc28f64..cae26babc7 100644 > > --- a/sysdeps/x86/cpu-features.c > > +++ b/sysdeps/x86/cpu-features.c > > @@ -1098,6 +1098,12 @@ > https://www.intel.com/content/www/us/en/support/articles/000059422/proce > ssors.ht > > get_extended_indices (cpu_features); > > > > update_active (cpu_features); > > + > > + /* Use Prefer_Non_Temporal flag to access the non-temporal > > + memset implementation due to ERMS is disable in Hygon > > + processors. */ > > + cpu_features->preferred[index_arch_Prefer_Non_Temporal] > > + |= (bit_arch_Prefer_Non_Temporal); > > } > > else > > { > > -- > > 2.43.0 > > > > > -- > H.J.
diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c index 034dc28f64..cae26babc7 100644 --- a/sysdeps/x86/cpu-features.c +++ b/sysdeps/x86/cpu-features.c @@ -1098,6 +1098,12 @@ https://www.intel.com/content/www/us/en/support/articles/000059422/processors.ht get_extended_indices (cpu_features); update_active (cpu_features); + + /* Use Prefer_Non_Temporal flag to access the non-temporal + memset implementation due to ERMS is disable in Hygon + processors. */ + cpu_features->preferred[index_arch_Prefer_Non_Temporal] + |= (bit_arch_Prefer_Non_Temporal); } else {