Message ID | 1d8eb765-e147-534e-ed1e-daa8deb8d5a7@arm.com |
---|---|
State | New |
Headers | show |
Series | aarch64: add HWCAP_ATOMICS to HWCAP_IMPORTANT | expand |
On 19/04/2018 08:51, Szabolcs Nagy wrote: > This enables searching shared libraries in atomics/ when the hardware > supports LSE atomics of armv8.1 so one can provide optimized variants > of libraries in a portable way. > > LSE atomics does not affect library abi, the new instructions can > interoperate with old ones. > > I'm not familiar with how this feature of the dynamic linker is used > in practice by distros or others so comments are welcome. Clearlinux seems to use this to provide optimized Intel libraries [1]. > > 2018-04-19 Szabolcs Nagy <szabolcs.nagy@arm.com> > > * sysdeps/unix/sysv/linux/aarch64/dl-procinfo.h (HWCAP_IMPORTANT): Add > HWCAP_ATOMICS. I think what you want is something what x86_64 has done [2]: on cpu-features.c the code creates a list of possible processor specific paths and sets it do GLRO(dl_platform) (for instance on x86_64 if the underlying system is a haswell it will add the haswell folder path). Currently since AArch64 do not change dl_platform_init, it adds 'aarch64' from AT_PLATFORM and 'cpuid' because of HWCAP_IMPORTANT. IMHO neither does make sense as search paths, I would expect at least the 'cpu_list' from aarch cpu-features.c (maybe by excluding the 'generic' field). So I suggest to rework how aarch64 obtain the search path by setting the dl_platform in cpu-features.c: - We can get the cpu_list if HWCAP_CPUID, so add only current cpu folder if it the case. - If HWCAP_ATOMICS is set add 'lse'. - Any more required? [1] https://clearlinux.org/blogs/transparent-use-library-packages-optimized-intel-architecture [2] https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=1432d38ea04ab5e96f21a38;hp=3b5f801ddb838311b5b05c218caac3bdb00d7c95
On Thu, 2018-04-19 at 12:51 +0100, Szabolcs Nagy wrote: > This enables searching shared libraries in atomics/ when the hardware > supports LSE atomics of armv8.1 so one can provide optimized variants > of libraries in a portable way. > > LSE atomics does not affect library abi, the new instructions can > interoperate with old ones. > > I'm not familiar with how this feature of the dynamic linker is used > in practice by distros or others so comments are welcome. > > 2018-04-19 Szabolcs Nagy <szabolcs.nagy@arm.com> > > * sysdeps/unix/sysv/linux/aarch64/dl-procinfo.h > (HWCAP_IMPORTANT): Add > HWCAP_ATOMICS. I don't know if this is relavent or not but I checked in changes a few months ago to use IFUNC in libatomic so that it could use LSE (or not) depending on the aarch64 hardware it is running on. So if a user is just using libatomic calls then they don't need a separate library or a sperate search path. If they want their own libraries with versions that do or do not use LSE directly (and don't want to use IFUNCs) then this could still be needed/desired. https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00187.html https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00434.html Steve Ellcey sellcey@cavium.com
On 19/04/18 17:08, Steve Ellcey wrote: > On Thu, 2018-04-19 at 12:51 +0100, Szabolcs Nagy wrote: >> This enables searching shared libraries in atomics/ when the hardware >> supports LSE atomics of armv8.1 so one can provide optimized variants >> of libraries in a portable way. >> >> LSE atomics does not affect library abi, the new instructions can >> interoperate with old ones. >> >> I'm not familiar with how this feature of the dynamic linker is used >> in practice by distros or others so comments are welcome. >> >> 2018-04-19 Szabolcs Nagy <szabolcs.nagy@arm.com> >> >> * sysdeps/unix/sysv/linux/aarch64/dl-procinfo.h >> (HWCAP_IMPORTANT): Add >> HWCAP_ATOMICS. > > I don't know if this is relavent or not but I checked in changes a few > months ago to use IFUNC in libatomic so that it could use LSE (or not) > depending on the aarch64 hardware it is running on. So if a user is > just using libatomic calls then they don't need a separate library or a > sperate search path. If they want their own libraries with versions > that do or do not use LSE directly (and don't want to use IFUNCs) then > this could still be needed/desired. > yeah, i mainly want to enable libpthread/libc with lse atomics, it's not practical to ifunc all pthread primitives and malloc in glibc, but the armv8.1 atomic instructions may give significant benefit so we want a way to use those in distros. > https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00187.html > https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00434.html > > Steve Ellcey > sellcey@cavium.com >
On 19/04/18 15:38, Adhemerval Zanella wrote: > On 19/04/2018 08:51, Szabolcs Nagy wrote: >> This enables searching shared libraries in atomics/ when the hardware >> supports LSE atomics of armv8.1 so one can provide optimized variants >> of libraries in a portable way. >> >> LSE atomics does not affect library abi, the new instructions can >> interoperate with old ones. >> >> I'm not familiar with how this feature of the dynamic linker is used >> in practice by distros or others so comments are welcome. > > Clearlinux seems to use this to provide optimized Intel libraries [1]. > interesting thanks. >> 2018-04-19 Szabolcs Nagy <szabolcs.nagy@arm.com> >> >> * sysdeps/unix/sysv/linux/aarch64/dl-procinfo.h (HWCAP_IMPORTANT): Add >> HWCAP_ATOMICS. > > I think what you want is something what x86_64 has done [2]: on cpu-features.c > the code creates a list of possible processor specific paths and sets it do > GLRO(dl_platform) (for instance on x86_64 if the underlying system is a haswell > it will add the haswell folder path). > > Currently since AArch64 do not change dl_platform_init, it adds 'aarch64' from > AT_PLATFORM and 'cpuid' because of HWCAP_IMPORTANT. IMHO neither does make > sense as search paths, I would expect at least the 'cpu_list' from aarch > cpu-features.c (maybe by excluding the 'generic' field). > i don't know the reasons behind 'aarch64' and 'tls' search paths and i have no particular attachment to the HWCAP_IMPORTANT mechanism. > So I suggest to rework how aarch64 obtain the search path by setting the > dl_platform in cpu-features.c: > > - We can get the cpu_list if HWCAP_CPUID, so add only current cpu folder > if it the case. > > - If HWCAP_ATOMICS is set add 'lse'. > if these paths are for optimization only then i guess the list can change between libc releases without causing issues other than performance regressions. in that case i'm in favor of removing unnecessary search paths. atomics i think is a useful variant, i'll think about the cpuid based search paths, i don't want too many variants since nobody will prepare/test binaries for all uarch variants, but i do like the ability to have alternative optimized libs. > - Any more required? > > [1] https://clearlinux.org/blogs/transparent-use-library-packages-optimized-intel-architecture > [2] https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=1432d38ea04ab5e96f21a38;hp=3b5f801ddb838311b5b05c218caac3bdb00d7c95 >
On 19/04/2018 14:06, Szabolcs Nagy wrote: > On 19/04/18 15:38, Adhemerval Zanella wrote: >> On 19/04/2018 08:51, Szabolcs Nagy wrote: >>> This enables searching shared libraries in atomics/ when the hardware >>> supports LSE atomics of armv8.1 so one can provide optimized variants >>> of libraries in a portable way. >>> >>> LSE atomics does not affect library abi, the new instructions can >>> interoperate with old ones. >>> >>> I'm not familiar with how this feature of the dynamic linker is used >>> in practice by distros or others so comments are welcome. >> >> Clearlinux seems to use this to provide optimized Intel libraries [1]. >> > > interesting thanks. > >>> 2018-04-19 Szabolcs Nagy <szabolcs.nagy@arm.com> >>> >>> * sysdeps/unix/sysv/linux/aarch64/dl-procinfo.h (HWCAP_IMPORTANT): Add >>> HWCAP_ATOMICS. >> >> I think what you want is something what x86_64 has done [2]: on cpu-features.c >> the code creates a list of possible processor specific paths and sets it do >> GLRO(dl_platform) (for instance on x86_64 if the underlying system is a haswell >> it will add the haswell folder path). >> >> Currently since AArch64 do not change dl_platform_init, it adds 'aarch64' from >> AT_PLATFORM and 'cpuid' because of HWCAP_IMPORTANT. IMHO neither does make >> sense as search paths, I would expect at least the 'cpu_list' from aarch >> cpu-features.c (maybe by excluding the 'generic' field). >> > > i don't know the reasons behind 'aarch64' and 'tls' search paths > and i have no particular attachment to the HWCAP_IMPORTANT mechanism. 'aarch64' came from default code at elf/dl-sysdep.c if the platform does not override the dl_platform (and the code to set AT_PLATFORM on dl_platform is from commit 0a54e401). My guess is to provide a direct way to difference ABI folders for bi-arch system (x86_64 and i686 for instance). I think for aarch64 there is no direct gain in adding 'aarch64' in search path. > >> So I suggest to rework how aarch64 obtain the search path by setting the >> dl_platform in cpu-features.c: >> >> - We can get the cpu_list if HWCAP_CPUID, so add only current cpu folder >> if it the case. >> >> - If HWCAP_ATOMICS is set add 'lse'. >> > > if these paths are for optimization only then i guess the list > can change between libc releases without causing issues other > than performance regressions. > > in that case i'm in favor of removing unnecessary search paths. > > atomics i think is a useful variant, i'll think about the cpuid > based search paths, i don't want too many variants since nobody > will prepare/test binaries for all uarch variants, but i do like > the ability to have alternative optimized libs. I think adding just 'lse' (or other meaningful name) should be suffice for now. If cavium/qualcomm/etc desire, they can propose adding more search paths based on their requirements. > >> - Any more required? >> >> [1] https://clearlinux.org/blogs/transparent-use-library-packages-optimized-intel-architecture >> [2] https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=1432d38ea04ab5e96f21a38;hp=3b5f801ddb838311b5b05c218caac3bdb00d7c95 >> >
On 04/19/2018 01:06 PM, Szabolcs Nagy wrote: > if these paths are for optimization only then i guess the list > can change between libc releases without causing issues other > than performance regressions. There is a "full lifecycle" cost to turning on multilibs. In RPM transactions the new libraries are installed and then the old libraries are removed. This means that you can have an old multilib'd library present in the middle of the install, and if you are deprecating the multilib, the remaining library will be the old one, it may be loaded, and may not work correctly. Consider this case: * Add lse/ search dir if 8.1 LSE. * Deploy multilibs. * Deprecate multilib because performance is good enough by default and you want to stop paying for the build/test/qe costs of two builds. * Remove multilibs. At this point all multilib'd packages must deploy a "<package>_post_upgrade" application which erases all known multilib'd files to avoid loading a mixed lse/ library with a newer set of libraries. We do this in glibc via glibc_post_upgrade.c in Fedora [1]. One alternative is to deprecate the lse/ search dir immediately in ld.so, which is not always desirable because some applications may still be relying on it to load/operate correctly. So think carefully before turn on multilibs. The additional verification and deployment costs should not be ignored. As a reference, presently in Fedora for POWER we have 4 x ppc64be multilibs, and only 1 x ppc64le multilib. This is relatively under control because the ISAs are nested well and Fedora has all POWER8 builders (the latest multilib).
On 04/19/2018 07:51 AM, Szabolcs Nagy wrote: > This enables searching shared libraries in atomics/ when the hardware > supports LSE atomics of armv8.1 so one can provide optimized variants > of libraries in a portable way. > > LSE atomics does not affect library abi, the new instructions can > interoperate with old ones. > > I'm not familiar with how this feature of the dynamic linker is used > in practice by distros or others so comments are welcome. > > 2018-04-19 Szabolcs Nagy <szabolcs.nagy@arm.com> > > * sysdeps/unix/sysv/linux/aarch64/dl-procinfo.h (HWCAP_IMPORTANT): Add > HWCAP_ATOMICS. This is your choice to enable, but you must convince downstream distributions that it makes sense. Adhemerval's technical comments are on point regarding the patch and I'd follow his advice, though as you say one dir for each uarch is probably not the right fit for AArch64. Really, this is the "age old" question of building a multilib variant of an architecture and compiling for a specific ISA or ISA extension. If you turn this on, beware of the additional costs and complications, as I commented downthread: https://www.sourceware.org/ml/libc-alpha/2018-04/msg00624.html For distributions the key technical metric for deciding to multilib is performance. What performance improvements do you see when you inline the 8.1 LSE atomics? Across one application? Across the system? Across different workloads? Does that performance translate into real-world gains? The cost to a distro is not insubstantial, it is additional errata/QE (bodhi), requirements for more ARM hardware (8.1 LSE) to test upon, build routing requirements to route all builds to the new hardware for automated testing (koji), additional boxes for CI exposure (koeshi), etc. etc. Each distro must balance the value this brings against the cost to the users and customers. Your job is going to be to convince them that the performance is so good that it outweighs the costs. I'm excited to be convinced :-)
diff --git a/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.h b/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.h index 6887713149..4530cc2159 100644 --- a/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.h +++ b/sysdeps/unix/sysv/linux/aarch64/dl-procinfo.h @@ -27,9 +27,9 @@ /* We cannot provide a general printing function. */ #define _dl_procinfo(type, word) -1 -/* HWCAP_CPUID should be available by default to influence IFUNC as well as - library search. */ -#define HWCAP_IMPORTANT HWCAP_CPUID +/* Default hwcap_mask setting, affects the library search path and cpu_features + used by glibc internal IFUNCs when the selected hwcaps are available. */ +#define HWCAP_IMPORTANT (HWCAP_CPUID | HWCAP_ATOMICS) static inline const char * __attribute__ ((unused))