Message ID | 20230404102847.3303623-1-mpe@ellerman.id.au (mailing list archive) |
---|---|
State | Accepted |
Commit | 78f0929884d4811c225fd2c57ecc602c84c07392 |
Headers | show |
Series | powerpc/64: Always build with 128-bit long double | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/github-powerpc_ppctests | success | Successfully ran 8 jobs. |
snowpatch_ozlabs/github-powerpc_selftests | success | Successfully ran 8 jobs. |
snowpatch_ozlabs/github-powerpc_kernel_qemu | success | Successfully ran 24 jobs. |
snowpatch_ozlabs/github-powerpc_clang | success | Successfully ran 6 jobs. |
snowpatch_ozlabs/github-powerpc_sparse | success | Successfully ran 4 jobs. |
Hi! On Tue, Apr 04, 2023 at 08:28:47PM +1000, Michael Ellerman wrote: > The amdgpu driver builds some of its code with hard-float enabled, > whereas the rest of the kernel is built with soft-float. > > When building with 64-bit long double, if soft-float and hard-float > objects are linked together, the build fails due to incompatible ABI > tags. > Currently those build errors are avoided because the amdgpu driver is > gated on 128-bit long double being enabled. But that's not a detail the > amdgpu driver should need to be aware of, and if another driver starts > using hard-float the same problem would occur. Well. The kernel driver either has no business using long double (or any other floating point even) at all, or it should know exactly what is used: double precision, double-double, or quadruple precision. Both of the latter two are 128 bits. > All versions of the 64-bit ABI specify that long-double is 128-bits. > However some compilers, notably the kernel.org ones, are built to use > 64-bit long double by default. Mea culpa, I suppose? But builddall doesn't force 64 bit explicitly. I wonder how this happened? Is it maybe a problem in the powerpc64le config in GCC itself? I have a patch from summer last year (Arnd's toolchains are built without it) that does + powerpc64le-*) TARGET_GCC_CONF=--with-long-double-128 Unfortunately I don't remember why I did that, and I never investigated what the deeper problem is :-/ In either case, the kernel should always use specific types, not rely on the toolchain to pick a type that may or may not work. The correct size floating point type alone is not enough, but it is a step in the right direction certainly. Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org> Segher
Segher Boessenkool <segher@kernel.crashing.org> writes: > On Tue, Apr 04, 2023 at 08:28:47PM +1000, Michael Ellerman wrote: >> The amdgpu driver builds some of its code with hard-float enabled, >> whereas the rest of the kernel is built with soft-float. >> >> When building with 64-bit long double, if soft-float and hard-float >> objects are linked together, the build fails due to incompatible ABI >> tags. > >> Currently those build errors are avoided because the amdgpu driver is >> gated on 128-bit long double being enabled. But that's not a detail the >> amdgpu driver should need to be aware of, and if another driver starts >> using hard-float the same problem would occur. > > Well. The kernel driver either has no business using long double (or > any other floating point even) at all, or it should know exactly what is > used: double precision, double-double, or quadruple precision. Both of > the latter two are 128 bits. In a perfect world ... :) >> All versions of the 64-bit ABI specify that long-double is 128-bits. >> However some compilers, notably the kernel.org ones, are built to use >> 64-bit long double by default. > > Mea culpa, I suppose? But builddall doesn't force 64 bit explicitly. > I wonder how this happened? Is it maybe a problem in the powerpc64le > config in GCC itself? Not blaming anyone, just one of those things that happens. The toolchains the distros (Ubuntu/Fedora) build all seem to use 128, but possibly that's because someone told them to configure them that way at some point. > I have a patch from summer last year (Arnd's > toolchains are built without it) that does > + powerpc64le-*) TARGET_GCC_CONF=--with-long-double-128 > Unfortunately I don't remember why I did that, and I never investigated > what the deeper problem is :-/ Last summer (aka winter) is when we first discovered this issue with the long double size being implicated. See: https://git.kernel.org/torvalds/c/c653c591789b3acfa4bf6ae45d5af4f330e50a91 So I guess that's what prompted your patch? > In either case, the kernel should always use specific types, not rely on > the toolchain to pick a type that may or may not work. The correct size > floating point type alone is not enough, but it is a step in the right > direction certainly. > > Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org> Thanks. cheers
Hi! On Wed, Apr 05, 2023 at 03:32:21PM +1000, Michael Ellerman wrote: > Segher Boessenkool <segher@kernel.crashing.org> writes: > > On Tue, Apr 04, 2023 at 08:28:47PM +1000, Michael Ellerman wrote: > >> The amdgpu driver builds some of its code with hard-float enabled, > >> whereas the rest of the kernel is built with soft-float. > >> > >> When building with 64-bit long double, if soft-float and hard-float > >> objects are linked together, the build fails due to incompatible ABI > >> tags. > > > >> Currently those build errors are avoided because the amdgpu driver is > >> gated on 128-bit long double being enabled. But that's not a detail the > >> amdgpu driver should need to be aware of, and if another driver starts > >> using hard-float the same problem would occur. > > > > Well. The kernel driver either has no business using long double (or > > any other floating point even) at all, or it should know exactly what is > > used: double precision, double-double, or quadruple precision. Both of > > the latter two are 128 bits. > > In a perfect world ... :) Well, without it knowing what exactly it calculates, does this code have any business running in kernel space? Is it acceptable to just do random things in the kernel? I don't know the kernel code that uses long double at all (and I'm afraid to look for fear of going blind), but all this sounds like the 64-bit IEEE double precision floating point is not good enough for some certain calculation, but 80-bit extended double precision as used on x86 is. That does make it likely that both of our 128-bit formats would work, but there are lots and lots of "buts". To start with, what does that code require wrt fp contraction (so, floating multiply-add)? All of this suggests that there should not be floating point code here *at all*, it is harder to use it in any acceptable way than to just do things in fixed point or scaled integer or whatever. > >> All versions of the 64-bit ABI specify that long-double is 128-bits. > >> However some compilers, notably the kernel.org ones, are built to use > >> 64-bit long double by default. > > > > Mea culpa, I suppose? But buildall doesn't force 64 bit explicitly. > > I wonder how this happened? Is it maybe a problem in the powerpc64le > > config in GCC itself? > > Not blaming anyone, just one of those things that happens. Oh I didn't say anyone is blaming me. I want to fix the problem, that is all :-) > The > toolchains the distros (Ubuntu/Fedora) build all seem to use 128, but > possibly that's because someone told them to configure them that way at > some point. No, or yes, depending on how you look at it? Default configurations all have 128-bit long double. But buildall uses (almost) the same configuration on all targets, namely: $GCC_SRC/configure \ --target=$TARGET --enable-targets=all --prefix=$PREFIX \ --enable-languages=c --without-headers --disable-bootstrap \ --disable-nls --disable-threads --disable-shared \ --disable-libmudflap --disable-libssp --disable-libgomp \ --disable-decimal-float --disable-libquadmath \ --disable-libatomic --disable-libcc1 --disable-libmpx All of this is perfectly reasonable imnsho, but I guess the --enable-targets=all causes the problem here? That makes no sense, but it is still my best guess. > > I have a patch from summer last year (Arnd's > > toolchains are built without it) that does > > + powerpc64le-*) TARGET_GCC_CONF=--with-long-double-128 > > Unfortunately I don't remember why I did that, and I never investigated > > what the deeper problem is :-/ > > Last summer (aka winter) Oh right. Last July :-) > is when we first discovered this issue with the > long double size being implicated. > > See: > https://git.kernel.org/torvalds/c/c653c591789b3acfa4bf6ae45d5af4f330e50a91 > > So I guess that's what prompted your patch? It was one day before my patch, maybe less than 12h even, so that could be. I don't update the kernel source automatically though (there are 50 to 100 build breaks every year, when things are in decent state I tend to keep it for a while). But it may have been our patches are due to the same cause, and mine is no longer needed? That would be nice. I never committed that patch (or there would be more context, sigh). I'll dig, there is a real problem in the compiler it seems. Thanks for the help so far! Segher
On 4/4/23 06:28, Michael Ellerman wrote: > The amdgpu driver builds some of its code with hard-float enabled, > whereas the rest of the kernel is built with soft-float. > > When building with 64-bit long double, if soft-float and hard-float > objects are linked together, the build fails due to incompatible ABI > tags. > > In the past there have been build errors in the amdgpu driver caused by > this, some of those were due to bad intermingling of soft & hard-float > code, but those issues have now all been fixed since commit c92b7fe0d92a > ("drm/amd/display: move remaining FPU code to dml folder"). > > However it's still possible for soft & hard-float objects to end up > linked together, if the amdgpu driver is built-in to the kernel along > with the test_emulate_step.c code, which uses soft-float. That happens > in an allyesconfig build. > > Currently those build errors are avoided because the amdgpu driver is > gated on 128-bit long double being enabled. But that's not a detail the > amdgpu driver should need to be aware of, and if another driver starts > using hard-float the same problem would occur. > > All versions of the 64-bit ABI specify that long-double is 128-bits. > However some compilers, notably the kernel.org ones, are built to use > 64-bit long double by default. > > Apart from this issue of soft vs hard-float, the kernel doesn't care > what size long double is. In particular the kernel using 128-bit long > double doesn't impact userspace's ability to use 64-bit long double, as > musl does. > > So always build the 64-bit kernel with 128-bit long double. That should > avoid any build errors due to the incompatible ABI tags. Excluding the > code that uses soft/hard-float, the vmlinux is identical with/without > the flag. > > It does mean any code which is incorrectly intermingling soft & > hard-float code will build without error, so those bugs will need to be > caught by testing rather than at build time. > > For more background see: > - commit d11219ad53dc ("amdgpu: disable powerpc support for the newer display engine") > - commit c653c591789b ("drm/amdgpu: Re-enable DCN for 64-bit powerpc") > - https://lore.kernel.org/r/dab9cbd8-2626-4b99-8098-31fe76397d2d@app.fastmail.com > > Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Hamza Mahfooz <hamza.mahfooz@amd.com> If you'd prefer to have this go through the amdgpu branch, please let me know. > --- > arch/powerpc/Kconfig | 4 ---- > arch/powerpc/Makefile | 1 + > drivers/gpu/drm/amd/display/Kconfig | 2 +- > 3 files changed, 2 insertions(+), 5 deletions(-) > > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig > index fc4e81dafca7..3fb2c2766139 100644 > --- a/arch/powerpc/Kconfig > +++ b/arch/powerpc/Kconfig > @@ -291,10 +291,6 @@ config PPC > # Please keep this list sorted alphabetically. > # > > -config PPC_LONG_DOUBLE_128 > - depends on PPC64 && ALTIVEC > - def_bool $(success,test "$(shell,echo __LONG_DOUBLE_128__ | $(CC) -E -P -)" = 1) > - > config PPC_BARRIER_NOSPEC > bool > default y > diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile > index 12447b2361e4..4343cca57cb3 100644 > --- a/arch/powerpc/Makefile > +++ b/arch/powerpc/Makefile > @@ -133,6 +133,7 @@ endif > endif > CFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mcmodel=medium,$(call cc-option,-mminimal-toc)) > CFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mno-pointers-to-nested-functions) > +CFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mlong-double-128) > > # Clang unconditionally reserves r2 on ppc32 and does not support the flag > # https://bugs.llvm.org/show_bug.cgi?id=39555 > diff --git a/drivers/gpu/drm/amd/display/Kconfig b/drivers/gpu/drm/amd/display/Kconfig > index 0c9bd0a53e60..e36261d546af 100644 > --- a/drivers/gpu/drm/amd/display/Kconfig > +++ b/drivers/gpu/drm/amd/display/Kconfig > @@ -8,7 +8,7 @@ config DRM_AMD_DC > depends on BROKEN || !CC_IS_CLANG || X86_64 || SPARC64 || ARM64 > select SND_HDA_COMPONENT if SND_HDA_CORE > # !CC_IS_CLANG: https://github.com/ClangBuiltLinux/linux/issues/1752 > - select DRM_AMD_DC_DCN if (X86 || PPC_LONG_DOUBLE_128 || (ARM64 && KERNEL_MODE_NEON && !CC_IS_CLANG)) > + select DRM_AMD_DC_DCN if (X86 || (PPC64 && ALTIVEC) || (ARM64 && KERNEL_MODE_NEON && !CC_IS_CLANG)) > help > Choose this option if you want to use the new display engine > support for AMDGPU. This adds required support for Vega and
Hamza Mahfooz <hamza.mahfooz@amd.com> writes: > On 4/4/23 06:28, Michael Ellerman wrote: >> The amdgpu driver builds some of its code with hard-float enabled, >> whereas the rest of the kernel is built with soft-float. >> >> When building with 64-bit long double, if soft-float and hard-float >> objects are linked together, the build fails due to incompatible ABI >> tags. >> >> In the past there have been build errors in the amdgpu driver caused by >> this, some of those were due to bad intermingling of soft & hard-float >> code, but those issues have now all been fixed since commit c92b7fe0d92a >> ("drm/amd/display: move remaining FPU code to dml folder"). >> >> However it's still possible for soft & hard-float objects to end up >> linked together, if the amdgpu driver is built-in to the kernel along >> with the test_emulate_step.c code, which uses soft-float. That happens >> in an allyesconfig build. >> >> Currently those build errors are avoided because the amdgpu driver is >> gated on 128-bit long double being enabled. But that's not a detail the >> amdgpu driver should need to be aware of, and if another driver starts >> using hard-float the same problem would occur. >> >> All versions of the 64-bit ABI specify that long-double is 128-bits. >> However some compilers, notably the kernel.org ones, are built to use >> 64-bit long double by default. >> >> Apart from this issue of soft vs hard-float, the kernel doesn't care >> what size long double is. In particular the kernel using 128-bit long >> double doesn't impact userspace's ability to use 64-bit long double, as >> musl does. >> >> So always build the 64-bit kernel with 128-bit long double. That should >> avoid any build errors due to the incompatible ABI tags. Excluding the >> code that uses soft/hard-float, the vmlinux is identical with/without >> the flag. >> >> It does mean any code which is incorrectly intermingling soft & >> hard-float code will build without error, so those bugs will need to be >> caught by testing rather than at build time. >> >> For more background see: >> - commit d11219ad53dc ("amdgpu: disable powerpc support for the newer display engine") >> - commit c653c591789b ("drm/amdgpu: Re-enable DCN for 64-bit powerpc") >> - https://lore.kernel.org/r/dab9cbd8-2626-4b99-8098-31fe76397d2d@app.fastmail.com >> >> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> > > Reviewed-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Thanks. > If you'd prefer to have this go through the amdgpu branch, please let > me know. I think it makes more sense to go via the powerpc tree, it will get more testing on powerpc that way. cheers
On Tue, 04 Apr 2023 20:28:47 +1000, Michael Ellerman wrote: > The amdgpu driver builds some of its code with hard-float enabled, > whereas the rest of the kernel is built with soft-float. > > When building with 64-bit long double, if soft-float and hard-float > objects are linked together, the build fails due to incompatible ABI > tags. > > [...] Applied to powerpc/next. [1/1] powerpc/64: Always build with 128-bit long double https://git.kernel.org/powerpc/c/78f0929884d4811c225fd2c57ecc602c84c07392 cheers
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index fc4e81dafca7..3fb2c2766139 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -291,10 +291,6 @@ config PPC # Please keep this list sorted alphabetically. # -config PPC_LONG_DOUBLE_128 - depends on PPC64 && ALTIVEC - def_bool $(success,test "$(shell,echo __LONG_DOUBLE_128__ | $(CC) -E -P -)" = 1) - config PPC_BARRIER_NOSPEC bool default y diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile index 12447b2361e4..4343cca57cb3 100644 --- a/arch/powerpc/Makefile +++ b/arch/powerpc/Makefile @@ -133,6 +133,7 @@ endif endif CFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mcmodel=medium,$(call cc-option,-mminimal-toc)) CFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mno-pointers-to-nested-functions) +CFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mlong-double-128) # Clang unconditionally reserves r2 on ppc32 and does not support the flag # https://bugs.llvm.org/show_bug.cgi?id=39555 diff --git a/drivers/gpu/drm/amd/display/Kconfig b/drivers/gpu/drm/amd/display/Kconfig index 0c9bd0a53e60..e36261d546af 100644 --- a/drivers/gpu/drm/amd/display/Kconfig +++ b/drivers/gpu/drm/amd/display/Kconfig @@ -8,7 +8,7 @@ config DRM_AMD_DC depends on BROKEN || !CC_IS_CLANG || X86_64 || SPARC64 || ARM64 select SND_HDA_COMPONENT if SND_HDA_CORE # !CC_IS_CLANG: https://github.com/ClangBuiltLinux/linux/issues/1752 - select DRM_AMD_DC_DCN if (X86 || PPC_LONG_DOUBLE_128 || (ARM64 && KERNEL_MODE_NEON && !CC_IS_CLANG)) + select DRM_AMD_DC_DCN if (X86 || (PPC64 && ALTIVEC) || (ARM64 && KERNEL_MODE_NEON && !CC_IS_CLANG)) help Choose this option if you want to use the new display engine support for AMDGPU. This adds required support for Vega and
The amdgpu driver builds some of its code with hard-float enabled, whereas the rest of the kernel is built with soft-float. When building with 64-bit long double, if soft-float and hard-float objects are linked together, the build fails due to incompatible ABI tags. In the past there have been build errors in the amdgpu driver caused by this, some of those were due to bad intermingling of soft & hard-float code, but those issues have now all been fixed since commit c92b7fe0d92a ("drm/amd/display: move remaining FPU code to dml folder"). However it's still possible for soft & hard-float objects to end up linked together, if the amdgpu driver is built-in to the kernel along with the test_emulate_step.c code, which uses soft-float. That happens in an allyesconfig build. Currently those build errors are avoided because the amdgpu driver is gated on 128-bit long double being enabled. But that's not a detail the amdgpu driver should need to be aware of, and if another driver starts using hard-float the same problem would occur. All versions of the 64-bit ABI specify that long-double is 128-bits. However some compilers, notably the kernel.org ones, are built to use 64-bit long double by default. Apart from this issue of soft vs hard-float, the kernel doesn't care what size long double is. In particular the kernel using 128-bit long double doesn't impact userspace's ability to use 64-bit long double, as musl does. So always build the 64-bit kernel with 128-bit long double. That should avoid any build errors due to the incompatible ABI tags. Excluding the code that uses soft/hard-float, the vmlinux is identical with/without the flag. It does mean any code which is incorrectly intermingling soft & hard-float code will build without error, so those bugs will need to be caught by testing rather than at build time. For more background see: - commit d11219ad53dc ("amdgpu: disable powerpc support for the newer display engine") - commit c653c591789b ("drm/amdgpu: Re-enable DCN for 64-bit powerpc") - https://lore.kernel.org/r/dab9cbd8-2626-4b99-8098-31fe76397d2d@app.fastmail.com Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> --- arch/powerpc/Kconfig | 4 ---- arch/powerpc/Makefile | 1 + drivers/gpu/drm/amd/display/Kconfig | 2 +- 3 files changed, 2 insertions(+), 5 deletions(-)