Message ID | 20220624201216.3783855-1-goldstein.w.n@gmail.com |
---|---|
State | New |
Headers | show |
Series | [v2] x86: Add missing Slow_SSE4_2 to ifunc-sse4_2.h | expand |
On Fri, Jun 24, 2022 at 1:12 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote: > > The functions that use this ifunc are strspn, strcspn, and strpbrk. > > All of these functions use pcmpstri which can be slow on some > processors (checked by Slow_SSE4_2). > --- > sysdeps/x86_64/multiarch/ifunc-sse4_2.h | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/sysdeps/x86_64/multiarch/ifunc-sse4_2.h b/sysdeps/x86_64/multiarch/ifunc-sse4_2.h > index ee36525bcf..1830597862 100644 > --- a/sysdeps/x86_64/multiarch/ifunc-sse4_2.h > +++ b/sysdeps/x86_64/multiarch/ifunc-sse4_2.h > @@ -27,7 +27,8 @@ IFUNC_SELECTOR (void) > { > const struct cpu_features* cpu_features = __get_cpu_features (); > > - if (CPU_FEATURE_USABLE_P (cpu_features, SSE4_2)) > + if (CPU_FEATURE_USABLE_P (cpu_features, SSE4_2) > + && !CPU_FEATURES_ARCH_P (cpu_features, Slow_SSE4_2)) > return OPTIMIZE (sse42); > > return OPTIMIZE (generic); > -- > 2.34.1 > Slower SSE 4.2 is relative to the SSE2 version in assembly codes. It may still be faster than the generic version in C.
diff --git a/sysdeps/x86_64/multiarch/ifunc-sse4_2.h b/sysdeps/x86_64/multiarch/ifunc-sse4_2.h index ee36525bcf..1830597862 100644 --- a/sysdeps/x86_64/multiarch/ifunc-sse4_2.h +++ b/sysdeps/x86_64/multiarch/ifunc-sse4_2.h @@ -27,7 +27,8 @@ IFUNC_SELECTOR (void) { const struct cpu_features* cpu_features = __get_cpu_features (); - if (CPU_FEATURE_USABLE_P (cpu_features, SSE4_2)) + if (CPU_FEATURE_USABLE_P (cpu_features, SSE4_2) + && !CPU_FEATURES_ARCH_P (cpu_features, Slow_SSE4_2)) return OPTIMIZE (sse42); return OPTIMIZE (generic);