Message ID | 20190621085822.1527-1-malat@debian.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | powerpc/lib/xor_vmx: Relax frame size for clang | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/apply_patch | success | Successfully applied on branch next (e610a466d16a086e321f0bd421e2fc75cff28605) |
snowpatch_ozlabs/build-ppc64le | success | Build succeeded |
snowpatch_ozlabs/build-ppc64be | success | Build succeeded |
snowpatch_ozlabs/build-ppc64e | success | Build succeeded |
snowpatch_ozlabs/build-pmac32 | success | Build succeeded |
snowpatch_ozlabs/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 9 lines checked |
Le 21/06/2019 à 10:58, Mathieu Malaterre a écrit : > When building with clang-8 the frame size limit is hit: > > ../arch/powerpc/lib/xor_vmx.c:119:6: error: stack frame size of 1200 bytes in function '__xor_altivec_5' [-Werror,-Wframe-larger-than=] > > Follow the same approach as commit 9c87156cce5a ("powerpc/xmon: Relax > frame size for clang") until a proper fix is implemented upstream in > clang and relax requirement for clang. With Clang 14 I get the following errors, but only with KASAN selected. CC arch/powerpc/lib/xor_vmx.o arch/powerpc/lib/xor_vmx.c:95:6: error: stack frame size (1040) exceeds limit (1024) in '__xor_altivec_4' [-Werror,-Wframe-larger-than] void __xor_altivec_4(unsigned long bytes, ^ arch/powerpc/lib/xor_vmx.c:124:6: error: stack frame size (1312) exceeds limit (1024) in '__xor_altivec_5' [-Werror,-Wframe-larger-than] void __xor_altivec_5(unsigned long bytes, ^ Is this patch still relevant ? Or should frame size be relaxed when KASAN is selected ? After all the stack size is multiplied by 2 when we have KASAN, so maybe the warning limit should be increased as well ? Thanks Christophe > > Link: https://github.com/ClangBuiltLinux/linux/issues/563 > Cc: Joel Stanley <joel@jms.id.au> > Signed-off-by: Mathieu Malaterre <malat@debian.org> > --- > arch/powerpc/lib/Makefile | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile > index c55f9c27bf79..b3f7d64caaf0 100644 > --- a/arch/powerpc/lib/Makefile > +++ b/arch/powerpc/lib/Makefile > @@ -58,5 +58,9 @@ obj-$(CONFIG_FTR_FIXUP_SELFTEST) += feature-fixups-test.o > > obj-$(CONFIG_ALTIVEC) += xor_vmx.o xor_vmx_glue.o > CFLAGS_xor_vmx.o += -maltivec $(call cc-option,-mabi=altivec) > +ifdef CONFIG_CC_IS_CLANG > +# See https://github.com/ClangBuiltLinux/linux/issues/563 > +CFLAGS_xor_vmx.o += -Wframe-larger-than=4096 > +endif > > obj-$(CONFIG_PPC64) += $(obj64-y)
Christophe Leroy <christophe.leroy@csgroup.eu> writes: > Le 21/06/2019 à 10:58, Mathieu Malaterre a écrit : >> When building with clang-8 the frame size limit is hit: >> >> ../arch/powerpc/lib/xor_vmx.c:119:6: error: stack frame size of 1200 bytes in function '__xor_altivec_5' [-Werror,-Wframe-larger-than=] >> >> Follow the same approach as commit 9c87156cce5a ("powerpc/xmon: Relax >> frame size for clang") until a proper fix is implemented upstream in >> clang and relax requirement for clang. > > With Clang 14 I get the following errors, but only with KASAN selected. > > CC arch/powerpc/lib/xor_vmx.o > arch/powerpc/lib/xor_vmx.c:95:6: error: stack frame size (1040) exceeds > limit (1024) in '__xor_altivec_4' [-Werror,-Wframe-larger-than] > void __xor_altivec_4(unsigned long bytes, > ^ > arch/powerpc/lib/xor_vmx.c:124:6: error: stack frame size (1312) exceeds > limit (1024) in '__xor_altivec_5' [-Werror,-Wframe-larger-than] > void __xor_altivec_5(unsigned long bytes, > ^ That's a 32-bit build? > Is this patch still relevant ? The clang issue was closed because a different change fixed the issue: https://github.com/ClangBuiltLinux/linux/issues/563 > Or should frame size be relaxed when KASAN is selected ? After all the > stack size is multiplied by 2 when we have KASAN, so maybe the warning > limit should be increased as well ? Yeah that would make some sense. On 64-bit the largest frame in that file is 1424, which is below the default 2048 byte limit. So maybe just increase it for 32-bit && KASAN. What would be nice is if the FRAME_WARN value could be calculated as a percentage of the THREAD_SHIFT, but that's not easily doable with the way things are structured in Kconfig. cheers
Le 08/09/2022 à 02:27, Michael Ellerman a écrit : > Christophe Leroy <christophe.leroy@csgroup.eu> writes: >> Le 21/06/2019 à 10:58, Mathieu Malaterre a écrit : >>> When building with clang-8 the frame size limit is hit: >>> >>> ../arch/powerpc/lib/xor_vmx.c:119:6: error: stack frame size of 1200 bytes in function '__xor_altivec_5' [-Werror,-Wframe-larger-than=] >>> >>> Follow the same approach as commit 9c87156cce5a ("powerpc/xmon: Relax >>> frame size for clang") until a proper fix is implemented upstream in >>> clang and relax requirement for clang. >> >> With Clang 14 I get the following errors, but only with KASAN selected. >> >> CC arch/powerpc/lib/xor_vmx.o >> arch/powerpc/lib/xor_vmx.c:95:6: error: stack frame size (1040) exceeds >> limit (1024) in '__xor_altivec_4' [-Werror,-Wframe-larger-than] >> void __xor_altivec_4(unsigned long bytes, >> ^ >> arch/powerpc/lib/xor_vmx.c:124:6: error: stack frame size (1312) exceeds >> limit (1024) in '__xor_altivec_5' [-Werror,-Wframe-larger-than] >> void __xor_altivec_5(unsigned long bytes, >> ^ > > That's a 32-bit build? Yes, pmac32_defconfig > >> Is this patch still relevant ? > > The clang issue was closed because a different change fixed the issue: > > https://github.com/ClangBuiltLinux/linux/issues/563 > >> Or should frame size be relaxed when KASAN is selected ? After all the >> stack size is multiplied by 2 when we have KASAN, so maybe the warning >> limit should be increased as well ? > > Yeah that would make some sense. > > On 64-bit the largest frame in that file is 1424, which is below the > default 2048 byte limit. > > So maybe just increase it for 32-bit && KASAN. > > What would be nice is if the FRAME_WARN value could be calculated as a > percentage of the THREAD_SHIFT, but that's not easily doable with the > way things are structured in Kconfig. > Looking at it more deeply, I see strange things. What is that frame size ? I thought it was the number of bytes r1 is decremented at the begining of the function, but it seems not, at least on GCC. It seems GCC substrats 112 bytes while clang doesn't. I set CONFIG_FRAME_WARN to 8 and with GCC and without KASAN, I get no warning, allthough I have: 00000000 <__xor_altivec_2>: 0: 94 21 ff f0 stwu r1,-16(r1) 00000078 <__xor_altivec_3>: 78: 94 21 ff f0 stwu r1,-16(r1) 0000010c <__xor_altivec_4>: 10c: 94 21 ff f0 stwu r1,-16(r1) 000001c4 <__xor_altivec_5>: 1c4: 94 21 ff e0 stwu r1,-32(r1) With GCC and inline KASAN I get: arch/powerpc/lib/xor_vmx.c: In function '__xor_altivec_2': arch/powerpc/lib/xor_vmx.c:69:1: warning: the frame size of 96 bytes is larger than 8 bytes [-Wframe-larger-than=] arch/powerpc/lib/xor_vmx.c: In function '__xor_altivec_3': arch/powerpc/lib/xor_vmx.c:93:1: warning: the frame size of 128 bytes is larger than 8 bytes [-Wframe-larger-than=] arch/powerpc/lib/xor_vmx.c: In function '__xor_altivec_4': arch/powerpc/lib/xor_vmx.c:122:1: warning: the frame size of 80 bytes is larger than 8 bytes [-Wframe-larger-than=] arch/powerpc/lib/xor_vmx.c: In function '__xor_altivec_5': arch/powerpc/lib/xor_vmx.c:156:1: warning: the frame size of 128 bytes is larger than 8 bytes [-Wframe-larger-than=] 00000000 <__xor_altivec_2>: 0: 94 21 ff 30 stwu r1,-208(r1) 00000458 <__xor_altivec_3>: 458: 94 21 ff 00 stwu r1,-256(r1) 00000b94 <__xor_altivec_4>: b94: 94 21 fe b0 stwu r1,-336(r1) 000015b8 <__xor_altivec_5>: 15b8: 94 21 fe 60 stwu r1,-416(r1) With CLANG and without KASAN I get: CC arch/powerpc/lib/xor_vmx.o arch/powerpc/lib/xor_vmx.c:52:6: warning: stack frame size (144) exceeds limit (8) in '__xor_altivec_2' [-Wframe-larger-than] void __xor_altivec_2(unsigned long bytes, arch/powerpc/lib/xor_vmx.c:71:6: warning: stack frame size (144) exceeds limit (8) in '__xor_altivec_3' [-Wframe-larger-than] void __xor_altivec_3(unsigned long bytes, arch/powerpc/lib/xor_vmx.c:95:6: warning: stack frame size (160) exceeds limit (8) in '__xor_altivec_4' [-Wframe-larger-than] void __xor_altivec_4(unsigned long bytes, arch/powerpc/lib/xor_vmx.c:124:6: warning: stack frame size (144) exceeds limit (8) in '__xor_altivec_5' [-Wframe-larger-than] void __xor_altivec_5(unsigned long bytes, 00000000 <__xor_altivec_2>: 0: 94 21 ff 70 stwu r1,-144(r1) 00000528 <__xor_altivec_3>: 528: 94 21 ff 70 stwu r1,-144(r1) 00000c4c <__xor_altivec_4>: c4c: 94 21 ff 60 stwu r1,-160(r1) 000015a4 <__xor_altivec_5>: 15a4: 94 21 ff 70 stwu r1,-144(r1) With CLANG and with inline KASAN I get: arch/powerpc/lib/xor_vmx.c:52:6: warning: stack frame size (512) exceeds limit (8) in '__xor_altivec_2' [-Wframe-larger-than] void __xor_altivec_2(unsigned long bytes, arch/powerpc/lib/xor_vmx.c:71:6: warning: stack frame size (768) exceeds limit (8) in '__xor_altivec_3' [-Wframe-larger-than] void __xor_altivec_3(unsigned long bytes, arch/powerpc/lib/xor_vmx.c:95:6: warning: stack frame size (1040) exceeds limit (8) in '__xor_altivec_4' [-Wframe-larger-than] void __xor_altivec_4(unsigned long bytes, arch/powerpc/lib/xor_vmx.c:124:6: warning: stack frame size (1312) exceeds limit (8) in '__xor_altivec_5' [-Wframe-larger-than] void __xor_altivec_5(unsigned long bytes, 00000000 <__xor_altivec_2>: 8: 94 21 fe 00 stwu r1,-512(r1) 00000a24 <__xor_altivec_3>: a2c: 94 21 fd 00 stwu r1,-768(r1) 000019a4 <__xor_altivec_4>: 19ac: 94 21 fb f0 stwu r1,-1040(r1) 00002f20 <__xor_altivec_5>: 2f28: 94 21 fa e0 stwu r1,-1312(r1) So it seems that GCC and CLANG don't warn on the same thing, is that expected ? GCC substrats 112 bytes, which is the minimum frame size on a ppc64, but here I'm building a ppc32 kernel, min frame size is 16. And CLANG is still using stack a lot more than GCC. Christophe
On Thu, Sep 08, 2022 at 06:00:24AM +0000, Christophe Leroy wrote: > Looking at it more deeply, I see strange things. I'll have to see full generated machine code to be able to see strange things, there isn't enough information at all here yet. Sorry. Use private mail if it is too big or uninteresting for the list :-) > What is that frame size ? I thought it was the number of bytes r1 is > decremented at the begining of the function, but it seems not, at least > on GCC. It seems GCC substrats 112 bytes while clang doesn't. That is the vars size + the fixed size + the size of the parameter save area + the size of the regs save area, rounded up to a multiple of 16. Fixed size is 8 on 32-bit PowerPC ELF. Frame size used by GCC here is just the vars size. > So it seems that GCC and CLANG don't warn on the same thing, is that > expected ? GCC substrats 112 bytes, which is the minimum frame size on a > ppc64, but here I'm building a ppc32 kernel, min frame size is 16. I need to see the generated code to make sense of what is happening here. It sounds like it is doing varargs calls or similar expensive stack juggling. Or just saving a boatload of registers on the stack. > And CLANG is still using stack a lot more than GCC. Good to hear! Well, good for GCC, anyway ;-) Segher
On Thu, Sep 8, 2022, at 2:27 AM, Michael Ellerman wrote: > Christophe Leroy <christophe.leroy@csgroup.eu> writes: > > Yeah that would make some sense. > > On 64-bit the largest frame in that file is 1424, which is below the > default 2048 byte limit. > > So maybe just increase it for 32-bit && KASAN. > > What would be nice is if the FRAME_WARN value could be calculated as a > percentage of the THREAD_SHIFT, but that's not easily doable with the > way things are structured in Kconfig. > Increasing the warning limit slightly for 32-bit with CONFIG_KASAN_STACK makes sense, but there are a lot of related concerns: - I was hoping to still stay under 1280 bytes for the warning limit, so that even with KASAN_STACK enabled, we are able to catch warnings in functions that use a stupid amount of local variables, without getting too many false positives. - if the XOR code has its frame size explode like this, it's probably an indication of the compiler doing something wrong, not the kernel code. The result is likely that the "optimized" XOR implementation is slower than the default version as a result, and the kernel will pick the other one at boot time. This needs to be confirmed of course, but an easier workaround for this instance might be to just disable the xor_vmx module when KASAN_STACK is set. - The warning limit on 32-bit is actually 2028 bytes when GCC_PLUGIN_LATENT_ENTROPY is set. I think this is a mistake and we should lower /that/ limit instead, but a side-effect here is that an allmodconfig kernel build with gcc will fail to warn about bugs that exist both with gcc and clang, while clang complains about it. Arnd
Hi! On Thu, Sep 08, 2022 at 05:07:24PM +0200, Arnd Bergmann wrote: > - if the XOR code has its frame size explode like this, it's > probably an indication of the compiler doing something wrong, > not the kernel code. On the contrary, it is most likely an indication that the kernel code wants something unreasonable. Like, having 20 variables live at the same time, but still wanting nicely scheduled machine code generated. But I suspect GCC unrolled the loops here, even? Best way to prevent that here is to put an option in the Makefile, for these files. We don't want any of this unrolled after all? Or, alternatively, remove all the manual unrolling from this code, let GCC do its thing, without painting it in a corner. > The result is likely that the "optimized" > XOR implementation is slower than the default version as a > result, and the kernel will pick the other one at boot time. Yes. So it's self-healing even, of a sort :-) Segher
Le 08/09/2022 à 15:48, Segher Boessenkool a écrit : > On Thu, Sep 08, 2022 at 06:00:24AM +0000, Christophe Leroy wrote: >> Looking at it more deeply, I see strange things. > > I'll have to see full generated machine code to be able to see strange > things, there isn't enough information at all here yet. Sorry. Well, what I call strange is the fact that with GCC the number of bytes reported by -Wframe-larger-than doesn't match the value the offset used for the stwu at the start of the function, while it does with clang. > > Use private mail if it is too big or uninteresting for the list :-) > >> What is that frame size ? I thought it was the number of bytes r1 is >> decremented at the begining of the function, but it seems not, at least >> on GCC. It seems GCC substrats 112 bytes while clang doesn't. > > That is the vars size + the fixed size + the size of the parameter > save area + the size of the regs save area, rounded up to a multiple > of 16. Fixed size is 8 on 32-bit PowerPC ELF. Frame size used by GCC > here is just the vars size. Ok, so it means that the stack utilisation is underestimated when using GCC ? Or is it clang that overestimates it ? > >> So it seems that GCC and CLANG don't warn on the same thing, is that >> expected ? GCC substrats 112 bytes, which is the minimum frame size on a >> ppc64, but here I'm building a ppc32 kernel, min frame size is 16. > > I need to see the generated code to make sense of what is happening > here. It sounds like it is doing varargs calls or similar expensive > stack juggling. Or just saving a boatload of registers on the stack. > Ok, I'll send it to you. But once again, I don't mind what the code really look like, I'm just worried that GCC doesn't report the entire stack usage. Christophe
diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile index c55f9c27bf79..b3f7d64caaf0 100644 --- a/arch/powerpc/lib/Makefile +++ b/arch/powerpc/lib/Makefile @@ -58,5 +58,9 @@ obj-$(CONFIG_FTR_FIXUP_SELFTEST) += feature-fixups-test.o obj-$(CONFIG_ALTIVEC) += xor_vmx.o xor_vmx_glue.o CFLAGS_xor_vmx.o += -maltivec $(call cc-option,-mabi=altivec) +ifdef CONFIG_CC_IS_CLANG +# See https://github.com/ClangBuiltLinux/linux/issues/563 +CFLAGS_xor_vmx.o += -Wframe-larger-than=4096 +endif obj-$(CONFIG_PPC64) += $(obj64-y)
When building with clang-8 the frame size limit is hit: ../arch/powerpc/lib/xor_vmx.c:119:6: error: stack frame size of 1200 bytes in function '__xor_altivec_5' [-Werror,-Wframe-larger-than=] Follow the same approach as commit 9c87156cce5a ("powerpc/xmon: Relax frame size for clang") until a proper fix is implemented upstream in clang and relax requirement for clang. Link: https://github.com/ClangBuiltLinux/linux/issues/563 Cc: Joel Stanley <joel@jms.id.au> Signed-off-by: Mathieu Malaterre <malat@debian.org> --- arch/powerpc/lib/Makefile | 4 ++++ 1 file changed, 4 insertions(+)