Message ID | aea33b4813a26bdb9378b5f273f00bd5d4abe240.1638857364.git.christophe.leroy@csgroup.eu (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
Series | powerpc/603: Fix boot failure with DEBUG_PAGEALLOC and KFENCE | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/github-powerpc_ppctests | success | Successfully ran 8 jobs. |
snowpatch_ozlabs/github-powerpc_selftests | success | Successfully ran 8 jobs. |
snowpatch_ozlabs/github-powerpc_sparse | success | Successfully ran 4 jobs. |
snowpatch_ozlabs/github-powerpc_kernel_qemu | fail | boot (ppc64le_guest_defconfig, pseries+p8+tcg, pseries+p9+tcg, qemu-system-ppc64, ppc64le-rootfs.... failed at step Run qemu-pseries+p8+tcg with fedora-34 build kernel. |
snowpatch_ozlabs/github-powerpc_clang | success | Successfully ran 7 jobs. |
On Tue, 2021-12-07 at 06:10 +0000, Christophe Leroy wrote: Hello, With the patch applied and CONFIG_DEBUG_PAGEALLOC=y CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT=y CONFIG_DEBUG_VM=y I get tons of this during boot: [ 0.000000] Dentry cache hash table entries: 262144 (order: 8, 1048576 bytes, linear) [ 0.000000] Inode-cache hash table entries: 131072 (order: 7, 524288 bytes, linear) [ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off [ 0.000000] ------------[ cut here ]------------ [ 0.000000] WARNING: CPU: 0 PID: 0 at arch/powerpc/mm/pgtable.c:194 set_pte_at+0x18/0x160 [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.15.0+ #442 [ 0.000000] NIP: 80015ebc LR: 80016728 CTR: 800166e4 [ 0.000000] REGS: 80751dd0 TRAP: 0700 Not tainted (5.15.0+) [ 0.000000] MSR: 00021032 <ME,IR,DR,RI> CR: 42228882 XER: 20000000 [ 0.000000] [ 0.000000] GPR00: 800b8dc8 80751e80 806c6300 807311d8 807a1000 8ffffe84 80751ea8 00000000 [ 0.000000] GPR08: 007a1591 00000001 007a1180 00000000 42224882 00000000 3ff9c608 3fffd79c [ 0.000000] GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 800166e4 807a2000 [ 0.000000] GPR24: 807a1fff 807311d8 807311d8 807a2000 80768804 00000000 807a1000 007a1180 [ 0.000000] NIP [80015ebc] set_pte_at+0x18/0x160 [ 0.000000] LR [80016728] set_page_attr+0x44/0xc0 [ 0.000000] Call Trace: [ 0.000000] [80751e80] [80058570] console_unlock+0x340/0x428 (unreliable) [ 0.000000] [80751ea0] [00000000] 0x0 [ 0.000000] [80751ec0] [800b8dc8] __apply_to_page_range+0x144/0x2a8 [ 0.000000] [80751f00] [80016918] __kernel_map_pages+0x54/0x64 [ 0.000000] [80751f10] [800cfeb0] __free_pages_ok+0x1b0/0x440 [ 0.000000] [80751f50] [805cfc8c] memblock_free_all+0x1d8/0x274 [ 0.000000] [80751f90] [805c5e0c] mem_init+0x3c/0xd0 [ 0.000000] [80751fb0] [805c0bdc] start_kernel+0x404/0x5c4 [ 0.000000] [80751ff0] [000033f0] 0x33f0 [ 0.000000] Instruction dump: [ 0.000000] 7c630034 83e1000c 5463d97e 7c0803a6 38210010 4e800020 9421ffe0 93e1001c [ 0.000000] 83e60000 81250000 71290001 41820014 <0fe00000> 7c0802a6 93c10018 90010024
Le 07/12/2021 à 11:34, Maxime Bizon a écrit : > > On Tue, 2021-12-07 at 06:10 +0000, Christophe Leroy wrote: > > Hello, > > With the patch applied and > > CONFIG_DEBUG_PAGEALLOC=y > CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT=y > CONFIG_DEBUG_VM=y > > I get tons of this during boot: > > [ 0.000000] Dentry cache hash table entries: 262144 (order: 8, 1048576 bytes, linear) > [ 0.000000] Inode-cache hash table entries: 131072 (order: 7, 524288 bytes, linear) > [ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off > [ 0.000000] ------------[ cut here ]------------ > [ 0.000000] WARNING: CPU: 0 PID: 0 at arch/powerpc/mm/pgtable.c:194 set_pte_at+0x18/0x160 > [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.15.0+ #442 > [ 0.000000] NIP: 80015ebc LR: 80016728 CTR: 800166e4 > [ 0.000000] REGS: 80751dd0 TRAP: 0700 Not tainted (5.15.0+) > [ 0.000000] MSR: 00021032 <ME,IR,DR,RI> CR: 42228882 XER: 20000000 > [ 0.000000] > [ 0.000000] GPR00: 800b8dc8 80751e80 806c6300 807311d8 807a1000 8ffffe84 80751ea8 00000000 > [ 0.000000] GPR08: 007a1591 00000001 007a1180 00000000 42224882 00000000 3ff9c608 3fffd79c > [ 0.000000] GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 800166e4 807a2000 > [ 0.000000] GPR24: 807a1fff 807311d8 807311d8 807a2000 80768804 00000000 807a1000 007a1180 > [ 0.000000] NIP [80015ebc] set_pte_at+0x18/0x160 > [ 0.000000] LR [80016728] set_page_attr+0x44/0xc0 > [ 0.000000] Call Trace: > [ 0.000000] [80751e80] [80058570] console_unlock+0x340/0x428 (unreliable) > [ 0.000000] [80751ea0] [00000000] 0x0 > [ 0.000000] [80751ec0] [800b8dc8] __apply_to_page_range+0x144/0x2a8 > [ 0.000000] [80751f00] [80016918] __kernel_map_pages+0x54/0x64 > [ 0.000000] [80751f10] [800cfeb0] __free_pages_ok+0x1b0/0x440 > [ 0.000000] [80751f50] [805cfc8c] memblock_free_all+0x1d8/0x274 > [ 0.000000] [80751f90] [805c5e0c] mem_init+0x3c/0xd0 > [ 0.000000] [80751fb0] [805c0bdc] start_kernel+0x404/0x5c4 > [ 0.000000] [80751ff0] [000033f0] 0x33f0 > [ 0.000000] Instruction dump: > [ 0.000000] 7c630034 83e1000c 5463d97e 7c0803a6 38210010 4e800020 9421ffe0 93e1001c > [ 0.000000] 83e60000 81250000 71290001 41820014 <0fe00000> 7c0802a6 93c10018 90010024 > > That's unrelated to this patch. The problem is linked to patch c988cfd38e48 ("powerpc/32: use set_memory_attr()"), which changed from using __set_pte_at() to using set_memory_attr() which uses set_pte_at(). set_pte_at() has additional checks and shall not be used to updating an existing PTE. Wondering if I should just use __set_pte_at() instead like in the past, or do like commit 9f7853d7609d ("powerpc/mm: Fix set_memory_*() against concurrent accesses") and use pte_update() Michael, Aneesh, any suggestion ? Thanks Christophe
Christophe Leroy <christophe.leroy@csgroup.eu> writes: > Le 07/12/2021 à 11:34, Maxime Bizon a écrit : >> >> On Tue, 2021-12-07 at 06:10 +0000, Christophe Leroy wrote: >> >> Hello, >> >> With the patch applied and >> >> CONFIG_DEBUG_PAGEALLOC=y >> CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT=y >> CONFIG_DEBUG_VM=y >> >> I get tons of this during boot: >> >> [ 0.000000] Dentry cache hash table entries: 262144 (order: 8, 1048576 bytes, linear) >> [ 0.000000] Inode-cache hash table entries: 131072 (order: 7, 524288 bytes, linear) >> [ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off >> [ 0.000000] ------------[ cut here ]------------ >> [ 0.000000] WARNING: CPU: 0 PID: 0 at arch/powerpc/mm/pgtable.c:194 set_pte_at+0x18/0x160 >> [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.15.0+ #442 >> [ 0.000000] NIP: 80015ebc LR: 80016728 CTR: 800166e4 >> [ 0.000000] REGS: 80751dd0 TRAP: 0700 Not tainted (5.15.0+) >> [ 0.000000] MSR: 00021032 <ME,IR,DR,RI> CR: 42228882 XER: 20000000 >> [ 0.000000] >> [ 0.000000] GPR00: 800b8dc8 80751e80 806c6300 807311d8 807a1000 8ffffe84 80751ea8 00000000 >> [ 0.000000] GPR08: 007a1591 00000001 007a1180 00000000 42224882 00000000 3ff9c608 3fffd79c >> [ 0.000000] GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 800166e4 807a2000 >> [ 0.000000] GPR24: 807a1fff 807311d8 807311d8 807a2000 80768804 00000000 807a1000 007a1180 >> [ 0.000000] NIP [80015ebc] set_pte_at+0x18/0x160 >> [ 0.000000] LR [80016728] set_page_attr+0x44/0xc0 >> [ 0.000000] Call Trace: >> [ 0.000000] [80751e80] [80058570] console_unlock+0x340/0x428 (unreliable) >> [ 0.000000] [80751ea0] [00000000] 0x0 >> [ 0.000000] [80751ec0] [800b8dc8] __apply_to_page_range+0x144/0x2a8 >> [ 0.000000] [80751f00] [80016918] __kernel_map_pages+0x54/0x64 >> [ 0.000000] [80751f10] [800cfeb0] __free_pages_ok+0x1b0/0x440 >> [ 0.000000] [80751f50] [805cfc8c] memblock_free_all+0x1d8/0x274 >> [ 0.000000] [80751f90] [805c5e0c] mem_init+0x3c/0xd0 >> [ 0.000000] [80751fb0] [805c0bdc] start_kernel+0x404/0x5c4 >> [ 0.000000] [80751ff0] [000033f0] 0x33f0 >> [ 0.000000] Instruction dump: >> [ 0.000000] 7c630034 83e1000c 5463d97e 7c0803a6 38210010 4e800020 9421ffe0 93e1001c >> [ 0.000000] 83e60000 81250000 71290001 41820014 <0fe00000> 7c0802a6 93c10018 90010024 >> >> > > That's unrelated to this patch. > > The problem is linked to patch c988cfd38e48 ("powerpc/32: use > set_memory_attr()"), which changed from using __set_pte_at() to using > set_memory_attr() which uses set_pte_at(). > > set_pte_at() has additional checks and shall not be used to updating an > existing PTE. > > Wondering if I should just use __set_pte_at() instead like in the past, > or do like commit 9f7853d7609d ("powerpc/mm: Fix set_memory_*() against > concurrent accesses") and use pte_update() > > Michael, Aneesh, any suggestion ? The motivation for using pte_update() in that commit is that it does the update atomically and also handles flushing the HPTE for 64-bit Hash. But the books/32 version of pte_update() doesn't do that. In fact there's some HPTE handling in __set_pte_at(), but then also a comment saying it's handling in a subsequent flush_tlb_xxx(). So that doesn't really help make a decision :) On the other hand, could you convert those set_memory_attr() calls to change_memory_attr() and then eventually drop the former? cheers
Le 09/12/2021 à 07:08, Michael Ellerman a écrit : > Christophe Leroy <christophe.leroy@csgroup.eu> writes: >> Le 07/12/2021 à 11:34, Maxime Bizon a écrit : >>> >>> On Tue, 2021-12-07 at 06:10 +0000, Christophe Leroy wrote: >>> >>> Hello, >>> >>> With the patch applied and >>> >>> CONFIG_DEBUG_PAGEALLOC=y >>> CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT=y >>> CONFIG_DEBUG_VM=y >>> >>> I get tons of this during boot: >>> >>> [ 0.000000] Dentry cache hash table entries: 262144 (order: 8, 1048576 bytes, linear) >>> [ 0.000000] Inode-cache hash table entries: 131072 (order: 7, 524288 bytes, linear) >>> [ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off >>> [ 0.000000] ------------[ cut here ]------------ >>> [ 0.000000] WARNING: CPU: 0 PID: 0 at arch/powerpc/mm/pgtable.c:194 set_pte_at+0x18/0x160 >>> [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.15.0+ #442 >>> [ 0.000000] NIP: 80015ebc LR: 80016728 CTR: 800166e4 >>> [ 0.000000] REGS: 80751dd0 TRAP: 0700 Not tainted (5.15.0+) >>> [ 0.000000] MSR: 00021032 <ME,IR,DR,RI> CR: 42228882 XER: 20000000 >>> [ 0.000000] >>> [ 0.000000] GPR00: 800b8dc8 80751e80 806c6300 807311d8 807a1000 8ffffe84 80751ea8 00000000 >>> [ 0.000000] GPR08: 007a1591 00000001 007a1180 00000000 42224882 00000000 3ff9c608 3fffd79c >>> [ 0.000000] GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 800166e4 807a2000 >>> [ 0.000000] GPR24: 807a1fff 807311d8 807311d8 807a2000 80768804 00000000 807a1000 007a1180 >>> [ 0.000000] NIP [80015ebc] set_pte_at+0x18/0x160 >>> [ 0.000000] LR [80016728] set_page_attr+0x44/0xc0 >>> [ 0.000000] Call Trace: >>> [ 0.000000] [80751e80] [80058570] console_unlock+0x340/0x428 (unreliable) >>> [ 0.000000] [80751ea0] [00000000] 0x0 >>> [ 0.000000] [80751ec0] [800b8dc8] __apply_to_page_range+0x144/0x2a8 >>> [ 0.000000] [80751f00] [80016918] __kernel_map_pages+0x54/0x64 >>> [ 0.000000] [80751f10] [800cfeb0] __free_pages_ok+0x1b0/0x440 >>> [ 0.000000] [80751f50] [805cfc8c] memblock_free_all+0x1d8/0x274 >>> [ 0.000000] [80751f90] [805c5e0c] mem_init+0x3c/0xd0 >>> [ 0.000000] [80751fb0] [805c0bdc] start_kernel+0x404/0x5c4 >>> [ 0.000000] [80751ff0] [000033f0] 0x33f0 >>> [ 0.000000] Instruction dump: >>> [ 0.000000] 7c630034 83e1000c 5463d97e 7c0803a6 38210010 4e800020 9421ffe0 93e1001c >>> [ 0.000000] 83e60000 81250000 71290001 41820014 <0fe00000> 7c0802a6 93c10018 90010024 >>> >>> >> >> That's unrelated to this patch. >> >> The problem is linked to patch c988cfd38e48 ("powerpc/32: use >> set_memory_attr()"), which changed from using __set_pte_at() to using >> set_memory_attr() which uses set_pte_at(). >> >> set_pte_at() has additional checks and shall not be used to updating an >> existing PTE. >> >> Wondering if I should just use __set_pte_at() instead like in the past, >> or do like commit 9f7853d7609d ("powerpc/mm: Fix set_memory_*() against >> concurrent accesses") and use pte_update() >> >> Michael, Aneesh, any suggestion ? > > The motivation for using pte_update() in that commit is that it does the > update atomically and also handles flushing the HPTE for 64-bit Hash. > > But the books/32 version of pte_update() doesn't do that. In fact > there's some HPTE handling in __set_pte_at(), but then also a comment > saying it's handling in a subsequent flush_tlb_xxx(). > > So that doesn't really help make a decision :) > > On the other hand, could you convert those set_memory_attr() calls to > change_memory_attr() and then eventually drop the former? Sure, that's probably the best. Initially I had to implement that set_memory_attr() variant because change_memory_attr() was doing a pte_clear() that was "sawing off the branch we're sitting on". In extenso mark_rodata_ro() couldn't use change_memory_attr() to change the text section to read-only because mark_rodata_ro() is itself in the text section. But now that change_memory_attr() is using pte_update() instead of going via a pte_clear(), it's possible to use it, so that's what I'll do. Thanks for the idea. Christophe
Michael, ping. Le 07/12/2021 à 07:10, Christophe Leroy a écrit : > Allthough kernel text is always mapped with BATs, we still have > inittext mapped with pages, so TLB miss handling is required > when CONFIG_DEBUG_PAGEALLOC or CONFIG_KFENCE is set. > > The final solution should be to set a BAT that also maps inittext > but that BAT then needs to be cleared at end of init, and it will > require more changes to be able to do it properly. > > As DEBUG_PAGEALLOC or KFENCE are debugging, performance is not a big > deal so let's fix it simply for now to enable easy stable application. > > Reported-by: Maxime Bizon <mbizon@freebox.fr> > Fixes: 035b19a15a98 ("powerpc/32s: Always map kernel text and rodata with BATs") > Cc: stable@vger.kernel.org > Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> > --- > arch/powerpc/kernel/head_book3s_32.S | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/arch/powerpc/kernel/head_book3s_32.S b/arch/powerpc/kernel/head_book3s_32.S > index 68e5c0a7e99d..2e2a8211b17b 100644 > --- a/arch/powerpc/kernel/head_book3s_32.S > +++ b/arch/powerpc/kernel/head_book3s_32.S > @@ -421,14 +421,14 @@ InstructionTLBMiss: > */ > /* Get PTE (linux-style) and check access */ > mfspr r3,SPRN_IMISS > -#ifdef CONFIG_MODULES > +#if defined(CONFIG_MODULES) || defined(CONFIG_DEBUG_PAGEALLOC) || defined(CONFIG_KFENCE) > lis r1, TASK_SIZE@h /* check if kernel address */ > cmplw 0,r1,r3 > #endif > mfspr r2, SPRN_SDR1 > li r1,_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_EXEC | _PAGE_USER > rlwinm r2, r2, 28, 0xfffff000 > -#ifdef CONFIG_MODULES > +#if defined(CONFIG_MODULES) || defined(CONFIG_DEBUG_PAGEALLOC) || defined(CONFIG_KFENCE) > bgt- 112f > lis r2, (swapper_pg_dir - PAGE_OFFSET)@ha /* if kernel address, use */ > li r1,_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_EXEC
On Tue, 7 Dec 2021 06:10:05 +0000, Christophe Leroy wrote: > Allthough kernel text is always mapped with BATs, we still have > inittext mapped with pages, so TLB miss handling is required > when CONFIG_DEBUG_PAGEALLOC or CONFIG_KFENCE is set. > > The final solution should be to set a BAT that also maps inittext > but that BAT then needs to be cleared at end of init, and it will > require more changes to be able to do it properly. > > [...] Applied to powerpc/fixes. [1/1] powerpc/603: Fix boot failure with DEBUG_PAGEALLOC and KFENCE https://git.kernel.org/powerpc/c/9bb162fa26ed76031ed0e7dbc77ccea0bf977758 cheers
diff --git a/arch/powerpc/kernel/head_book3s_32.S b/arch/powerpc/kernel/head_book3s_32.S index 68e5c0a7e99d..2e2a8211b17b 100644 --- a/arch/powerpc/kernel/head_book3s_32.S +++ b/arch/powerpc/kernel/head_book3s_32.S @@ -421,14 +421,14 @@ InstructionTLBMiss: */ /* Get PTE (linux-style) and check access */ mfspr r3,SPRN_IMISS -#ifdef CONFIG_MODULES +#if defined(CONFIG_MODULES) || defined(CONFIG_DEBUG_PAGEALLOC) || defined(CONFIG_KFENCE) lis r1, TASK_SIZE@h /* check if kernel address */ cmplw 0,r1,r3 #endif mfspr r2, SPRN_SDR1 li r1,_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_EXEC | _PAGE_USER rlwinm r2, r2, 28, 0xfffff000 -#ifdef CONFIG_MODULES +#if defined(CONFIG_MODULES) || defined(CONFIG_DEBUG_PAGEALLOC) || defined(CONFIG_KFENCE) bgt- 112f lis r2, (swapper_pg_dir - PAGE_OFFSET)@ha /* if kernel address, use */ li r1,_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_EXEC
Allthough kernel text is always mapped with BATs, we still have inittext mapped with pages, so TLB miss handling is required when CONFIG_DEBUG_PAGEALLOC or CONFIG_KFENCE is set. The final solution should be to set a BAT that also maps inittext but that BAT then needs to be cleared at end of init, and it will require more changes to be able to do it properly. As DEBUG_PAGEALLOC or KFENCE are debugging, performance is not a big deal so let's fix it simply for now to enable easy stable application. Reported-by: Maxime Bizon <mbizon@freebox.fr> Fixes: 035b19a15a98 ("powerpc/32s: Always map kernel text and rodata with BATs") Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> --- arch/powerpc/kernel/head_book3s_32.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)