Message ID | 20221001190911.2994478-1-aurelien@aurel32.net |
---|---|
Headers | show |
Series | x86: Fix AVX2 string functions requiring BMI2 or LZCNT (BZ #29611) | expand |
On Sat, Oct 1, 2022 at 12:09 PM Aurelien Jarno <aurelien@aurel32.net> wrote: > > Some early Intel Haswell CPU have AVX2 instructions, but do not have > BMI2 instructions. Some AVX2 string functions only check for AVX2, but > use BMI2 or LZCNT instructions. This patchset tries to fix that. > > While most fixes only change ifunc-impl-list.c, and thus only concerns > the testsuite, the strn(case)cmp is a real issue affecting early Intel > Haswell CPU, reported to affect Debian Sid and Fedora Rawhide. > > On the other hand, the check for LZCNT in memrchr is purely for > correctness, I am not aware of a CPU implementing AVX2 without LZCNT. > > This has been tested by remplacing all BMI2 and LZCNT instruction in the > source code by the "ud2" instruction and disabling the BMI1, BMI2 > feature detection, and running the testsuite. > > Resolves: BZ #29611 > > Aurelien Jarno (4): > x86: include BMI1 and BMI2 in x86-64-v3 level > x86-64: Require BMI2 for AVX2 strn(case)cmp and wcsncmp > implementations > x86-64: Require BMI2 for AVX2 (raw|w)memchr implementations > x86-64: Require LZCNT for AVX2 memrchr implementation > We also need BMI2 check in ifunc-impl-list for: strcasecmp strcmp strcasecmp_l strrchr wcsrchr wcscmp If you want you can make patches, otherwise I can. > sysdeps/x86/get-isa-level.h | 2 + > sysdeps/x86_64/multiarch/ifunc-avx2.h | 1 + > sysdeps/x86_64/multiarch/ifunc-impl-list.c | 44 +++++++++++++++------ > sysdeps/x86_64/multiarch/ifunc-strcasecmp.h | 1 + > sysdeps/x86_64/multiarch/strncmp.c | 4 +- > 5 files changed, 38 insertions(+), 14 deletions(-) > > -- > 2.35.1 >
On 2022-10-01 15:11, Noah Goldstein wrote: > On Sat, Oct 1, 2022 at 12:09 PM Aurelien Jarno <aurelien@aurel32.net> wrote: > > > > Some early Intel Haswell CPU have AVX2 instructions, but do not have > > BMI2 instructions. Some AVX2 string functions only check for AVX2, but > > use BMI2 or LZCNT instructions. This patchset tries to fix that. > > > > While most fixes only change ifunc-impl-list.c, and thus only concerns > > the testsuite, the strn(case)cmp is a real issue affecting early Intel > > Haswell CPU, reported to affect Debian Sid and Fedora Rawhide. > > > > On the other hand, the check for LZCNT in memrchr is purely for > > correctness, I am not aware of a CPU implementing AVX2 without LZCNT. > > > > This has been tested by remplacing all BMI2 and LZCNT instruction in the > > source code by the "ud2" instruction and disabling the BMI1, BMI2 > > feature detection, and running the testsuite. > > > > Resolves: BZ #29611 > > > > Aurelien Jarno (4): > > x86: include BMI1 and BMI2 in x86-64-v3 level > > x86-64: Require BMI2 for AVX2 strn(case)cmp and wcsncmp > > implementations > > x86-64: Require BMI2 for AVX2 (raw|w)memchr implementations > > x86-64: Require LZCNT for AVX2 memrchr implementation > > > > We also need BMI2 check in ifunc-impl-list for: > strcasecmp > strcmp > strcasecmp_l I didn't included those, because 'bzhil' is only used the 'n' case. That said with the 'shlx' you mentioned in the other email, we should indeed include that one. > strrchr > wcsrchr > wcscmp Same for those, I missed 'shlx'. I'll fix that.
On Sat, Oct 1, 2022 at 3:11 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote: > > On Sat, Oct 1, 2022 at 12:09 PM Aurelien Jarno <aurelien@aurel32.net> wrote: > > > > Some early Intel Haswell CPU have AVX2 instructions, but do not have > > BMI2 instructions. Some AVX2 string functions only check for AVX2, but > > use BMI2 or LZCNT instructions. This patchset tries to fix that. > > > > While most fixes only change ifunc-impl-list.c, and thus only concerns > > the testsuite, the strn(case)cmp is a real issue affecting early Intel > > Haswell CPU, reported to affect Debian Sid and Fedora Rawhide. > > > > On the other hand, the check for LZCNT in memrchr is purely for > > correctness, I am not aware of a CPU implementing AVX2 without LZCNT. > > > > This has been tested by remplacing all BMI2 and LZCNT instruction in the > > source code by the "ud2" instruction and disabling the BMI1, BMI2 > > feature detection, and running the testsuite. > > > > Resolves: BZ #29611 > > > > Aurelien Jarno (4): > > x86: include BMI1 and BMI2 in x86-64-v3 level > > x86-64: Require BMI2 for AVX2 strn(case)cmp and wcsncmp > > implementations > > x86-64: Require BMI2 for AVX2 (raw|w)memchr implementations > > x86-64: Require LZCNT for AVX2 memrchr implementation > > > > We also need BMI2 check in ifunc-impl-list for: > strcasecmp > strcmp > strcasecmp_l > strrchr > wcsrchr > wcscmp > > If you want you can make patches, otherwise I can. This is a duplicate of a comment I left in the strn(case)cmp patchset, but leaving here so the information is not scattered: The ifunc change in strncmp.c and ifunc-strcasecmp.h need to be backport to 2.33, 2.34, 2.35. Also separate changes for ifunc need to be backport to strncmp.c: 2.32, 2.31, 2.30, 2.29, 2.28 for a `tzcnt` usage that needs BMI1. Finally a corresponding fix is needed for strcmp.c as well (there is missing BMI2 check in strcmp.c ifunc selection as well as missing checks in the impl list). > > sysdeps/x86/get-isa-level.h | 2 + > > sysdeps/x86_64/multiarch/ifunc-avx2.h | 1 + > > sysdeps/x86_64/multiarch/ifunc-impl-list.c | 44 +++++++++++++++------ > > sysdeps/x86_64/multiarch/ifunc-strcasecmp.h | 1 + > > sysdeps/x86_64/multiarch/strncmp.c | 4 +- > > 5 files changed, 38 insertions(+), 14 deletions(-) > > > > -- > > 2.35.1 > >
On 2022-10-01 15:17, Noah Goldstein via Libc-alpha wrote: > On Sat, Oct 1, 2022 at 3:11 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote: > > > > On Sat, Oct 1, 2022 at 12:09 PM Aurelien Jarno <aurelien@aurel32.net> wrote: > > > > > > Some early Intel Haswell CPU have AVX2 instructions, but do not have > > > BMI2 instructions. Some AVX2 string functions only check for AVX2, but > > > use BMI2 or LZCNT instructions. This patchset tries to fix that. > > > > > > While most fixes only change ifunc-impl-list.c, and thus only concerns > > > the testsuite, the strn(case)cmp is a real issue affecting early Intel > > > Haswell CPU, reported to affect Debian Sid and Fedora Rawhide. > > > > > > On the other hand, the check for LZCNT in memrchr is purely for > > > correctness, I am not aware of a CPU implementing AVX2 without LZCNT. > > > > > > This has been tested by remplacing all BMI2 and LZCNT instruction in the > > > source code by the "ud2" instruction and disabling the BMI1, BMI2 > > > feature detection, and running the testsuite. > > > > > > Resolves: BZ #29611 > > > > > > Aurelien Jarno (4): > > > x86: include BMI1 and BMI2 in x86-64-v3 level > > > x86-64: Require BMI2 for AVX2 strn(case)cmp and wcsncmp > > > implementations > > > x86-64: Require BMI2 for AVX2 (raw|w)memchr implementations > > > x86-64: Require LZCNT for AVX2 memrchr implementation > > > > > > > We also need BMI2 check in ifunc-impl-list for: > > strcasecmp > > strcmp > > strcasecmp_l > > strrchr > > wcsrchr > > wcscmp > > > > If you want you can make patches, otherwise I can. > > This is a duplicate of a comment I left in the strn(case)cmp patchset, > but leaving here so the information is not scattered: > > The ifunc change in strncmp.c and ifunc-strcasecmp.h need to be backport > to 2.33, 2.34, 2.35. > > Also separate changes for ifunc need to be backport to strncmp.c: > 2.32, 2.31, 2.30, 2.29, 2.28 for a `tzcnt` usage that needs > BMI1. Is that really correct? According the commit log TZCNT is used in a way that is compatible with BSF: commit 1457016337072d1b6739f571846b619596990cb7 Author: Leonardo Sandoval <leonardo.sandoval.gonzalez@linux.intel.com> Date: Thu May 3 11:09:30 2018 -0500 x86-64: Optimize strcmp/wcscmp and strncmp/wcsncmp with AVX2 Optimize x86-64 strcmp/wcscmp and strncmp/wcsncmp with AVX2. It uses vector comparison as much as possible. Peak performance observed on a SkyLake machine: 9x, 3x, 2.5x and 5.5x for strcmp, strncmp, wcscmp and wcsncmp, respectively. The larger the comparison length, the more benefit using avx2 functions, except on the strcmp, where peak is observed at length == 32 bytes. Select AVX2 strcmp/wcscmp on AVX2 machines where vzeroupper is preferred and AVX unaligned load is fast. NB: It uses TZCNT instead of BSF since TZCNT produces the same result as BSF for non-zero input. TZCNT is faster than BSF and is executed as BSF if machine doesn't support TZCNT.
On Sun, Oct 2, 2022 at 2:35 AM Aurelien Jarno <aurelien@aurel32.net> wrote: > > On 2022-10-01 15:17, Noah Goldstein via Libc-alpha wrote: > > On Sat, Oct 1, 2022 at 3:11 PM Noah Goldstein <goldstein.w.n@gmail.com> wrote: > > > > > > On Sat, Oct 1, 2022 at 12:09 PM Aurelien Jarno <aurelien@aurel32.net> wrote: > > > > > > > > Some early Intel Haswell CPU have AVX2 instructions, but do not have > > > > BMI2 instructions. Some AVX2 string functions only check for AVX2, but > > > > use BMI2 or LZCNT instructions. This patchset tries to fix that. Think you're right. > > > > > > > > While most fixes only change ifunc-impl-list.c, and thus only concerns > > > > the testsuite, the strn(case)cmp is a real issue affecting early Intel > > > > Haswell CPU, reported to affect Debian Sid and Fedora Rawhide. > > > > > > > > On the other hand, the check for LZCNT in memrchr is purely for > > > > correctness, I am not aware of a CPU implementing AVX2 without LZCNT. > > > > > > > > This has been tested by remplacing all BMI2 and LZCNT instruction in the > > > > source code by the "ud2" instruction and disabling the BMI1, BMI2 > > > > feature detection, and running the testsuite. > > > > > > > > Resolves: BZ #29611 > > > > > > > > Aurelien Jarno (4): > > > > x86: include BMI1 and BMI2 in x86-64-v3 level > > > > x86-64: Require BMI2 for AVX2 strn(case)cmp and wcsncmp > > > > implementations > > > > x86-64: Require BMI2 for AVX2 (raw|w)memchr implementations > > > > x86-64: Require LZCNT for AVX2 memrchr implementation > > > > > > > > > > We also need BMI2 check in ifunc-impl-list for: > > > strcasecmp > > > strcmp > > > strcasecmp_l > > > strrchr > > > wcsrchr > > > wcscmp > > > > > > If you want you can make patches, otherwise I can. > > > > This is a duplicate of a comment I left in the strn(case)cmp patchset, > > but leaving here so the information is not scattered: > > > > The ifunc change in strncmp.c and ifunc-strcasecmp.h need to be backport > > to 2.33, 2.34, 2.35. > > > > Also separate changes for ifunc need to be backport to strncmp.c: > > 2.32, 2.31, 2.30, 2.29, 2.28 for a `tzcnt` usage that needs > > BMI1. > > Is that really correct? According the commit log TZCNT is used in a way > that is compatible with BSF: > > commit 1457016337072d1b6739f571846b619596990cb7 > Author: Leonardo Sandoval <leonardo.sandoval.gonzalez@linux.intel.com> > Date: Thu May 3 11:09:30 2018 -0500 > > x86-64: Optimize strcmp/wcscmp and strncmp/wcsncmp with AVX2 > > Optimize x86-64 strcmp/wcscmp and strncmp/wcsncmp with AVX2. It uses vector > comparison as much as possible. Peak performance observed on a SkyLake > machine: 9x, 3x, 2.5x and 5.5x for strcmp, strncmp, wcscmp and wcsncmp, > respectively. The larger the comparison length, the more benefit using > avx2 functions, except on the strcmp, where peak is observed at length > == 32 bytes. Select AVX2 strcmp/wcscmp on AVX2 machines where vzeroupper > is preferred and AVX unaligned load is fast. > > NB: It uses TZCNT instead of BSF since TZCNT produces the same result > as BSF for non-zero input. TZCNT is faster than BSF and is executed > as BSF if machine doesn't support TZCNT. > > -- > Aurelien Jarno GPG: 4096R/1DDD8C9B > aurelien@aurel32.net http://www.aurel32.net