diff mbox series

powerpc/64: Always build with 128-bit long double

Message ID 20230404102847.3303623-1-mpe@ellerman.id.au (mailing list archive)
State Accepted
Commit 78f0929884d4811c225fd2c57ecc602c84c07392
Headers show
Series powerpc/64: Always build with 128-bit long double | expand

Checks

Context Check Description
snowpatch_ozlabs/github-powerpc_ppctests success Successfully ran 8 jobs.
snowpatch_ozlabs/github-powerpc_selftests success Successfully ran 8 jobs.
snowpatch_ozlabs/github-powerpc_kernel_qemu success Successfully ran 24 jobs.
snowpatch_ozlabs/github-powerpc_clang success Successfully ran 6 jobs.
snowpatch_ozlabs/github-powerpc_sparse success Successfully ran 4 jobs.

Commit Message

Michael Ellerman April 4, 2023, 10:28 a.m. UTC
The amdgpu driver builds some of its code with hard-float enabled,
whereas the rest of the kernel is built with soft-float.

When building with 64-bit long double, if soft-float and hard-float
objects are linked together, the build fails due to incompatible ABI
tags.

In the past there have been build errors in the amdgpu driver caused by
this, some of those were due to bad intermingling of soft & hard-float
code, but those issues have now all been fixed since commit c92b7fe0d92a
("drm/amd/display: move remaining FPU code to dml folder").

However it's still possible for soft & hard-float objects to end up
linked together, if the amdgpu driver is built-in to the kernel along
with the test_emulate_step.c code, which uses soft-float. That happens
in an allyesconfig build.

Currently those build errors are avoided because the amdgpu driver is
gated on 128-bit long double being enabled. But that's not a detail the
amdgpu driver should need to be aware of, and if another driver starts
using hard-float the same problem would occur.

All versions of the 64-bit ABI specify that long-double is 128-bits.
However some compilers, notably the kernel.org ones, are built to use
64-bit long double by default.

Apart from this issue of soft vs hard-float, the kernel doesn't care
what size long double is. In particular the kernel using 128-bit long
double doesn't impact userspace's ability to use 64-bit long double, as
musl does.

So always build the 64-bit kernel with 128-bit long double. That should
avoid any build errors due to the incompatible ABI tags. Excluding the
code that uses soft/hard-float, the vmlinux is identical with/without
the flag.

It does mean any code which is incorrectly intermingling soft &
hard-float code will build without error, so those bugs will need to be
caught by testing rather than at build time.

For more background see:
  - commit d11219ad53dc ("amdgpu: disable powerpc support for the newer display engine")
  - commit c653c591789b ("drm/amdgpu: Re-enable DCN for 64-bit powerpc")
  - https://lore.kernel.org/r/dab9cbd8-2626-4b99-8098-31fe76397d2d@app.fastmail.com

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
---
 arch/powerpc/Kconfig                | 4 ----
 arch/powerpc/Makefile               | 1 +
 drivers/gpu/drm/amd/display/Kconfig | 2 +-
 3 files changed, 2 insertions(+), 5 deletions(-)

Comments

Segher Boessenkool April 4, 2023, 3:10 p.m. UTC | #1
Hi!

On Tue, Apr 04, 2023 at 08:28:47PM +1000, Michael Ellerman wrote:
> The amdgpu driver builds some of its code with hard-float enabled,
> whereas the rest of the kernel is built with soft-float.
> 
> When building with 64-bit long double, if soft-float and hard-float
> objects are linked together, the build fails due to incompatible ABI
> tags.

> Currently those build errors are avoided because the amdgpu driver is
> gated on 128-bit long double being enabled. But that's not a detail the
> amdgpu driver should need to be aware of, and if another driver starts
> using hard-float the same problem would occur.

Well.  The kernel driver either has no business using long double (or
any other floating point even) at all, or it should know exactly what is
used: double precision, double-double, or quadruple precision.  Both of
the latter two are 128 bits.

> All versions of the 64-bit ABI specify that long-double is 128-bits.
> However some compilers, notably the kernel.org ones, are built to use
> 64-bit long double by default.

Mea culpa, I suppose?  But builddall doesn't force 64 bit explicitly.
I wonder how this happened?  Is it maybe a problem in the powerpc64le
config in GCC itself?  I have a patch from summer last year (Arnd's
toolchains are built without it) that does
+       powerpc64le-*)  TARGET_GCC_CONF=--with-long-double-128
Unfortunately I don't remember why I did that, and I never investigated
what the deeper problem is :-/

In either case, the kernel should always use specific types, not rely on
the toolchain to pick a type that may or may not work.  The correct size
floating point type alone is not enough, but it is a step in the right
direction certainly.

Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>


Segher
Michael Ellerman April 5, 2023, 5:32 a.m. UTC | #2
Segher Boessenkool <segher@kernel.crashing.org> writes:
> On Tue, Apr 04, 2023 at 08:28:47PM +1000, Michael Ellerman wrote:
>> The amdgpu driver builds some of its code with hard-float enabled,
>> whereas the rest of the kernel is built with soft-float.
>> 
>> When building with 64-bit long double, if soft-float and hard-float
>> objects are linked together, the build fails due to incompatible ABI
>> tags.
>
>> Currently those build errors are avoided because the amdgpu driver is
>> gated on 128-bit long double being enabled. But that's not a detail the
>> amdgpu driver should need to be aware of, and if another driver starts
>> using hard-float the same problem would occur.
>
> Well.  The kernel driver either has no business using long double (or
> any other floating point even) at all, or it should know exactly what is
> used: double precision, double-double, or quadruple precision.  Both of
> the latter two are 128 bits.

In a perfect world ... :)

>> All versions of the 64-bit ABI specify that long-double is 128-bits.
>> However some compilers, notably the kernel.org ones, are built to use
>> 64-bit long double by default.
>
> Mea culpa, I suppose?  But builddall doesn't force 64 bit explicitly.
> I wonder how this happened?  Is it maybe a problem in the powerpc64le
> config in GCC itself?

Not blaming anyone, just one of those things that happens. The
toolchains the distros (Ubuntu/Fedora) build all seem to use 128, but
possibly that's because someone told them to configure them that way at
some point.

> I have a patch from summer last year (Arnd's
> toolchains are built without it) that does
> +       powerpc64le-*)  TARGET_GCC_CONF=--with-long-double-128
> Unfortunately I don't remember why I did that, and I never investigated
> what the deeper problem is :-/

Last summer (aka winter) is when we first discovered this issue with the
long double size being implicated.

See:
  https://git.kernel.org/torvalds/c/c653c591789b3acfa4bf6ae45d5af4f330e50a91

So I guess that's what prompted your patch?

> In either case, the kernel should always use specific types, not rely on
> the toolchain to pick a type that may or may not work.  The correct size
> floating point type alone is not enough, but it is a step in the right
> direction certainly.
>
> Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>

Thanks.

cheers
Segher Boessenkool April 5, 2023, 11:04 a.m. UTC | #3
Hi!

On Wed, Apr 05, 2023 at 03:32:21PM +1000, Michael Ellerman wrote:
> Segher Boessenkool <segher@kernel.crashing.org> writes:
> > On Tue, Apr 04, 2023 at 08:28:47PM +1000, Michael Ellerman wrote:
> >> The amdgpu driver builds some of its code with hard-float enabled,
> >> whereas the rest of the kernel is built with soft-float.
> >> 
> >> When building with 64-bit long double, if soft-float and hard-float
> >> objects are linked together, the build fails due to incompatible ABI
> >> tags.
> >
> >> Currently those build errors are avoided because the amdgpu driver is
> >> gated on 128-bit long double being enabled. But that's not a detail the
> >> amdgpu driver should need to be aware of, and if another driver starts
> >> using hard-float the same problem would occur.
> >
> > Well.  The kernel driver either has no business using long double (or
> > any other floating point even) at all, or it should know exactly what is
> > used: double precision, double-double, or quadruple precision.  Both of
> > the latter two are 128 bits.
> 
> In a perfect world ... :)

Well, without it knowing what exactly it calculates, does this code have
any business running in kernel space?  Is it acceptable to just do
random things in the kernel?  I don't know the kernel code that uses
long double at all (and I'm afraid to look for fear of going blind), but
all this sounds like the 64-bit IEEE double precision floating point is
not good enough for some certain calculation, but 80-bit extended double
precision as used on x86 is.  That does make it likely that both of our
128-bit formats would work, but there are lots and lots of "buts".  To
start with, what does that code require wrt fp contraction (so, floating
multiply-add)?

All of this suggests that there should not be floating point code here
*at all*, it is harder to use it in any acceptable way than to just do
things in fixed point or scaled integer or whatever.

> >> All versions of the 64-bit ABI specify that long-double is 128-bits.
> >> However some compilers, notably the kernel.org ones, are built to use
> >> 64-bit long double by default.
> >
> > Mea culpa, I suppose?  But buildall doesn't force 64 bit explicitly.
> > I wonder how this happened?  Is it maybe a problem in the powerpc64le
> > config in GCC itself?
> 
> Not blaming anyone, just one of those things that happens.

Oh I didn't say anyone is blaming me.  I want to fix the problem, that
is all :-)

> The
> toolchains the distros (Ubuntu/Fedora) build all seem to use 128, but
> possibly that's because someone told them to configure them that way at
> some point.

No, or yes, depending on how you look at it?  Default configurations all
have 128-bit long double.  But buildall uses (almost) the same
configuration on all targets, namely:

        $GCC_SRC/configure \
        --target=$TARGET --enable-targets=all --prefix=$PREFIX \
        --enable-languages=c --without-headers --disable-bootstrap \
        --disable-nls --disable-threads --disable-shared \
        --disable-libmudflap --disable-libssp --disable-libgomp \
        --disable-decimal-float --disable-libquadmath \
        --disable-libatomic --disable-libcc1 --disable-libmpx

All of this is perfectly reasonable imnsho, but I guess the
--enable-targets=all causes the problem here?  That makes no sense, but
it is still my best guess.

> > I have a patch from summer last year (Arnd's
> > toolchains are built without it) that does
> > +       powerpc64le-*)  TARGET_GCC_CONF=--with-long-double-128
> > Unfortunately I don't remember why I did that, and I never investigated
> > what the deeper problem is :-/
> 
> Last summer (aka winter)

Oh right.  Last July :-)

> is when we first discovered this issue with the
> long double size being implicated.
> 
> See:
>   https://git.kernel.org/torvalds/c/c653c591789b3acfa4bf6ae45d5af4f330e50a91
> 
> So I guess that's what prompted your patch?

It was one day before my patch, maybe less than 12h even, so that could
be.  I don't update the kernel source automatically though (there are
50 to 100 build breaks every year, when things are in decent state I
tend to keep it for a while).  But it may have been our patches are due
to the same cause, and mine is no longer needed?  That would be nice.  I
never committed that patch (or there would be more context, sigh).

I'll dig, there is a real problem in the compiler it seems.  Thanks for
the help so far!


Segher
Hamza Mahfooz April 6, 2023, 5:12 p.m. UTC | #4
On 4/4/23 06:28, Michael Ellerman wrote:
> The amdgpu driver builds some of its code with hard-float enabled,
> whereas the rest of the kernel is built with soft-float.
> 
> When building with 64-bit long double, if soft-float and hard-float
> objects are linked together, the build fails due to incompatible ABI
> tags.
> 
> In the past there have been build errors in the amdgpu driver caused by
> this, some of those were due to bad intermingling of soft & hard-float
> code, but those issues have now all been fixed since commit c92b7fe0d92a
> ("drm/amd/display: move remaining FPU code to dml folder").
> 
> However it's still possible for soft & hard-float objects to end up
> linked together, if the amdgpu driver is built-in to the kernel along
> with the test_emulate_step.c code, which uses soft-float. That happens
> in an allyesconfig build.
> 
> Currently those build errors are avoided because the amdgpu driver is
> gated on 128-bit long double being enabled. But that's not a detail the
> amdgpu driver should need to be aware of, and if another driver starts
> using hard-float the same problem would occur.
> 
> All versions of the 64-bit ABI specify that long-double is 128-bits.
> However some compilers, notably the kernel.org ones, are built to use
> 64-bit long double by default.
> 
> Apart from this issue of soft vs hard-float, the kernel doesn't care
> what size long double is. In particular the kernel using 128-bit long
> double doesn't impact userspace's ability to use 64-bit long double, as
> musl does.
> 
> So always build the 64-bit kernel with 128-bit long double. That should
> avoid any build errors due to the incompatible ABI tags. Excluding the
> code that uses soft/hard-float, the vmlinux is identical with/without
> the flag.
> 
> It does mean any code which is incorrectly intermingling soft &
> hard-float code will build without error, so those bugs will need to be
> caught by testing rather than at build time.
> 
> For more background see:
>    - commit d11219ad53dc ("amdgpu: disable powerpc support for the newer display engine")
>    - commit c653c591789b ("drm/amdgpu: Re-enable DCN for 64-bit powerpc")
>    - https://lore.kernel.org/r/dab9cbd8-2626-4b99-8098-31fe76397d2d@app.fastmail.com
> 
> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

Reviewed-by: Hamza Mahfooz <hamza.mahfooz@amd.com>

If you'd prefer to have this go through the amdgpu branch, please let
me know.

> ---
>   arch/powerpc/Kconfig                | 4 ----
>   arch/powerpc/Makefile               | 1 +
>   drivers/gpu/drm/amd/display/Kconfig | 2 +-
>   3 files changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index fc4e81dafca7..3fb2c2766139 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -291,10 +291,6 @@ config PPC
>   	# Please keep this list sorted alphabetically.
>   	#
>   
> -config PPC_LONG_DOUBLE_128
> -	depends on PPC64 && ALTIVEC
> -	def_bool $(success,test "$(shell,echo __LONG_DOUBLE_128__ | $(CC) -E -P -)" = 1)
> -
>   config PPC_BARRIER_NOSPEC
>   	bool
>   	default y
> diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
> index 12447b2361e4..4343cca57cb3 100644
> --- a/arch/powerpc/Makefile
> +++ b/arch/powerpc/Makefile
> @@ -133,6 +133,7 @@ endif
>   endif
>   CFLAGS-$(CONFIG_PPC64)	+= $(call cc-option,-mcmodel=medium,$(call cc-option,-mminimal-toc))
>   CFLAGS-$(CONFIG_PPC64)	+= $(call cc-option,-mno-pointers-to-nested-functions)
> +CFLAGS-$(CONFIG_PPC64)	+= $(call cc-option,-mlong-double-128)
>   
>   # Clang unconditionally reserves r2 on ppc32 and does not support the flag
>   # https://bugs.llvm.org/show_bug.cgi?id=39555
> diff --git a/drivers/gpu/drm/amd/display/Kconfig b/drivers/gpu/drm/amd/display/Kconfig
> index 0c9bd0a53e60..e36261d546af 100644
> --- a/drivers/gpu/drm/amd/display/Kconfig
> +++ b/drivers/gpu/drm/amd/display/Kconfig
> @@ -8,7 +8,7 @@ config DRM_AMD_DC
>   	depends on BROKEN || !CC_IS_CLANG || X86_64 || SPARC64 || ARM64
>   	select SND_HDA_COMPONENT if SND_HDA_CORE
>   	# !CC_IS_CLANG: https://github.com/ClangBuiltLinux/linux/issues/1752
> -	select DRM_AMD_DC_DCN if (X86 || PPC_LONG_DOUBLE_128 || (ARM64 && KERNEL_MODE_NEON && !CC_IS_CLANG))
> +	select DRM_AMD_DC_DCN if (X86 || (PPC64 && ALTIVEC) || (ARM64 && KERNEL_MODE_NEON && !CC_IS_CLANG))
>   	help
>   	  Choose this option if you want to use the new display engine
>   	  support for AMDGPU. This adds required support for Vega and
Michael Ellerman April 12, 2023, 12:34 p.m. UTC | #5
Hamza Mahfooz <hamza.mahfooz@amd.com> writes:
> On 4/4/23 06:28, Michael Ellerman wrote:
>> The amdgpu driver builds some of its code with hard-float enabled,
>> whereas the rest of the kernel is built with soft-float.
>> 
>> When building with 64-bit long double, if soft-float and hard-float
>> objects are linked together, the build fails due to incompatible ABI
>> tags.
>> 
>> In the past there have been build errors in the amdgpu driver caused by
>> this, some of those were due to bad intermingling of soft & hard-float
>> code, but those issues have now all been fixed since commit c92b7fe0d92a
>> ("drm/amd/display: move remaining FPU code to dml folder").
>> 
>> However it's still possible for soft & hard-float objects to end up
>> linked together, if the amdgpu driver is built-in to the kernel along
>> with the test_emulate_step.c code, which uses soft-float. That happens
>> in an allyesconfig build.
>> 
>> Currently those build errors are avoided because the amdgpu driver is
>> gated on 128-bit long double being enabled. But that's not a detail the
>> amdgpu driver should need to be aware of, and if another driver starts
>> using hard-float the same problem would occur.
>> 
>> All versions of the 64-bit ABI specify that long-double is 128-bits.
>> However some compilers, notably the kernel.org ones, are built to use
>> 64-bit long double by default.
>> 
>> Apart from this issue of soft vs hard-float, the kernel doesn't care
>> what size long double is. In particular the kernel using 128-bit long
>> double doesn't impact userspace's ability to use 64-bit long double, as
>> musl does.
>> 
>> So always build the 64-bit kernel with 128-bit long double. That should
>> avoid any build errors due to the incompatible ABI tags. Excluding the
>> code that uses soft/hard-float, the vmlinux is identical with/without
>> the flag.
>> 
>> It does mean any code which is incorrectly intermingling soft &
>> hard-float code will build without error, so those bugs will need to be
>> caught by testing rather than at build time.
>> 
>> For more background see:
>>    - commit d11219ad53dc ("amdgpu: disable powerpc support for the newer display engine")
>>    - commit c653c591789b ("drm/amdgpu: Re-enable DCN for 64-bit powerpc")
>>    - https://lore.kernel.org/r/dab9cbd8-2626-4b99-8098-31fe76397d2d@app.fastmail.com
>> 
>> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
>
> Reviewed-by: Hamza Mahfooz <hamza.mahfooz@amd.com>

Thanks.

> If you'd prefer to have this go through the amdgpu branch, please let
> me know.

I think it makes more sense to go via the powerpc tree, it will get more
testing on powerpc that way.

cheers
Michael Ellerman April 26, 2023, 12:01 p.m. UTC | #6
On Tue, 04 Apr 2023 20:28:47 +1000, Michael Ellerman wrote:
> The amdgpu driver builds some of its code with hard-float enabled,
> whereas the rest of the kernel is built with soft-float.
> 
> When building with 64-bit long double, if soft-float and hard-float
> objects are linked together, the build fails due to incompatible ABI
> tags.
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc/64: Always build with 128-bit long double
      https://git.kernel.org/powerpc/c/78f0929884d4811c225fd2c57ecc602c84c07392

cheers
diff mbox series

Patch

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index fc4e81dafca7..3fb2c2766139 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -291,10 +291,6 @@  config PPC
 	# Please keep this list sorted alphabetically.
 	#
 
-config PPC_LONG_DOUBLE_128
-	depends on PPC64 && ALTIVEC
-	def_bool $(success,test "$(shell,echo __LONG_DOUBLE_128__ | $(CC) -E -P -)" = 1)
-
 config PPC_BARRIER_NOSPEC
 	bool
 	default y
diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 12447b2361e4..4343cca57cb3 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -133,6 +133,7 @@  endif
 endif
 CFLAGS-$(CONFIG_PPC64)	+= $(call cc-option,-mcmodel=medium,$(call cc-option,-mminimal-toc))
 CFLAGS-$(CONFIG_PPC64)	+= $(call cc-option,-mno-pointers-to-nested-functions)
+CFLAGS-$(CONFIG_PPC64)	+= $(call cc-option,-mlong-double-128)
 
 # Clang unconditionally reserves r2 on ppc32 and does not support the flag
 # https://bugs.llvm.org/show_bug.cgi?id=39555
diff --git a/drivers/gpu/drm/amd/display/Kconfig b/drivers/gpu/drm/amd/display/Kconfig
index 0c9bd0a53e60..e36261d546af 100644
--- a/drivers/gpu/drm/amd/display/Kconfig
+++ b/drivers/gpu/drm/amd/display/Kconfig
@@ -8,7 +8,7 @@  config DRM_AMD_DC
 	depends on BROKEN || !CC_IS_CLANG || X86_64 || SPARC64 || ARM64
 	select SND_HDA_COMPONENT if SND_HDA_CORE
 	# !CC_IS_CLANG: https://github.com/ClangBuiltLinux/linux/issues/1752
-	select DRM_AMD_DC_DCN if (X86 || PPC_LONG_DOUBLE_128 || (ARM64 && KERNEL_MODE_NEON && !CC_IS_CLANG))
+	select DRM_AMD_DC_DCN if (X86 || (PPC64 && ALTIVEC) || (ARM64 && KERNEL_MODE_NEON && !CC_IS_CLANG))
 	help
 	  Choose this option if you want to use the new display engine
 	  support for AMDGPU. This adds required support for Vega and