Message ID | 87edvhntv0.fsf@mpe.ellerman.id.au (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | [GIT,PULL] Please pull powerpc/linux.git powerpc-6.1-1 tag | expand |
The pull request you sent on Sun, 09 Oct 2022 22:01:39 +1100:
> https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git tags/powerpc-6.1-1
has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/4899a36f91a9f9b06878471096bd143e7253006d
Thank you!
Hi Michael, On Sun, Oct 09, 2022 at 10:01:39PM +1100, Michael Ellerman wrote: > powerpc updates for 6.1 > > - Remove our now never-true definitions for pgd_huge() and p4d_leaf(). > > - Add pte_needs_flush() and huge_pmd_needs_flush() for 64-bit. > > - Add support for syscall wrappers. > > - Add support for KFENCE on 64-bit. > > - Update 64-bit HV KVM to use the new guest state entry/exit accounting API. > > - Support execute-only memory when using the Radix MMU (P9 or later). > > - Implement CONFIG_PARAVIRT_TIME_ACCOUNTING for pseries guests. > > - Updates to our linker script to move more data into read-only sections. > > - Allow the VDSO to be randomised on 32-bit. > > - Many other small features and fixes. FYI, something in here broke the wireguard test suite, which runs the iperf3 networking utility. The full log is here [1], but the relevant part is: [+] NS1: iperf3 -Z -t 3 -c 192.168.241.2 Connecting to host 192.168.241.2, port 5201 iperf3: error - failed to read /dev/urandom: Bad address I'll see if I can narrow it down a bit more and bisect. But just FYI, in case you have an intuition. Jason [1] https://build.wireguard.com/linux/4de65c5830233e7a4adf2e679510089ec4e210c7/powerpc.log
On Mon, Oct 10, 2022 at 01:25:25PM -0600, Jason A. Donenfeld wrote: > Hi Michael, > > On Sun, Oct 09, 2022 at 10:01:39PM +1100, Michael Ellerman wrote: > > powerpc updates for 6.1 > > > > - Remove our now never-true definitions for pgd_huge() and p4d_leaf(). > > > > - Add pte_needs_flush() and huge_pmd_needs_flush() for 64-bit. > > > > - Add support for syscall wrappers. > > > > - Add support for KFENCE on 64-bit. > > > > - Update 64-bit HV KVM to use the new guest state entry/exit accounting API. > > > > - Support execute-only memory when using the Radix MMU (P9 or later). > > > > - Implement CONFIG_PARAVIRT_TIME_ACCOUNTING for pseries guests. > > > > - Updates to our linker script to move more data into read-only sections. > > > > - Allow the VDSO to be randomised on 32-bit. > > > > - Many other small features and fixes. > > FYI, something in here broke the wireguard test suite, which runs the > iperf3 networking utility. The full log is here [1], but the relevant part > is: > > [+] NS1: iperf3 -Z -t 3 -c 192.168.241.2 > Connecting to host 192.168.241.2, port 5201 > iperf3: error - failed to read /dev/urandom: Bad address > > I'll see if I can narrow it down a bit more and bisect. But just FYI, in > case you have an intuition. Huh. From iov_iter.c: static int copyout(void __user *to, const void *from, size_t n) { size_t before = n; if (should_fail_usercopy()) return n; if (access_ok(to, n)) { instrument_copy_to_user(to, from, n); n = raw_copy_to_user(to, from, n); if (n == before) pr_err("SARU n still %zu pointer is %lx\n", n, (unsigned long)to); } return n; } I added the pr_err() there to catch the failure: [ 3.443506] SARU n still 64 pointer is b78db000 Also I managed to extract the failing portion of iperf3 into something smaller: int temp; char *x; ssize_t l; FILE *f; char template[] = "/blah-XXXXXX"; temp = mkstemp(template); if (temp < 0) panic("mkstemp"); if (unlink(template) < 0) panic("unlink"); if (ftruncate(temp, 0x20000) < 0) panic("ftruncate"); x = mmap(NULL, 0x20000, PROT_READ|PROT_WRITE, MAP_PRIVATE, temp, 0); if (x == MAP_FAILED) panic("mmap"); f = fopen("/dev/urandom", "rb"); if (!f) panic("fopen"); setbuf(f, NULL); if (fread(x, 1, 0x20000, f) != 0x20000) panic("fread"); Jason
On Mon, Oct 10, 2022 at 02:03:09PM -0600, Jason A. Donenfeld wrote: > On Mon, Oct 10, 2022 at 01:25:25PM -0600, Jason A. Donenfeld wrote: > > Hi Michael, > > > > On Sun, Oct 09, 2022 at 10:01:39PM +1100, Michael Ellerman wrote: > > > powerpc updates for 6.1 > > > > > > - Remove our now never-true definitions for pgd_huge() and p4d_leaf(). > > > > > > - Add pte_needs_flush() and huge_pmd_needs_flush() for 64-bit. > > > > > > - Add support for syscall wrappers. > > > > > > - Add support for KFENCE on 64-bit. > > > > > > - Update 64-bit HV KVM to use the new guest state entry/exit accounting API. > > > > > > - Support execute-only memory when using the Radix MMU (P9 or later). > > > > > > - Implement CONFIG_PARAVIRT_TIME_ACCOUNTING for pseries guests. > > > > > > - Updates to our linker script to move more data into read-only sections. > > > > > > - Allow the VDSO to be randomised on 32-bit. > > > > > > - Many other small features and fixes. > > > > FYI, something in here broke the wireguard test suite, which runs the > > iperf3 networking utility. The full log is here [1], but the relevant part > > is: > > > > [+] NS1: iperf3 -Z -t 3 -c 192.168.241.2 > > Connecting to host 192.168.241.2, port 5201 > > iperf3: error - failed to read /dev/urandom: Bad address > > > > I'll see if I can narrow it down a bit more and bisect. But just FYI, in > > case you have an intuition. > > Huh. From iov_iter.c: > > static int copyout(void __user *to, const void *from, size_t n) > { > size_t before = n; > if (should_fail_usercopy()) > return n; > if (access_ok(to, n)) { > instrument_copy_to_user(to, from, n); > n = raw_copy_to_user(to, from, n); > if (n == before) > pr_err("SARU n still %zu pointer is %lx\n", n, (unsigned long)to); > } > return n; > } > > I added the pr_err() there to catch the failure: > [ 3.443506] SARU n still 64 pointer is b78db000 > > Also I managed to extract the failing portion of iperf3 into something > smaller: > > int temp; > char *x; > ssize_t l; > FILE *f; > char template[] = "/blah-XXXXXX"; > > temp = mkstemp(template); > if (temp < 0) > panic("mkstemp"); > if (unlink(template) < 0) > panic("unlink"); > if (ftruncate(temp, 0x20000) < 0) > panic("ftruncate"); > x = mmap(NULL, 0x20000, PROT_READ|PROT_WRITE, MAP_PRIVATE, temp, 0); > if (x == MAP_FAILED) > panic("mmap"); > f = fopen("/dev/urandom", "rb"); > if (!f) > panic("fopen"); > setbuf(f, NULL); > if (fread(x, 1, 0x20000, f) != 0x20000) > panic("fread"); > > Jason Bisected: 7e92e01b724526b98cbc7f03dd4afa0295780d56 is the first bad commit commit 7e92e01b724526b98cbc7f03dd4afa0295780d56 Author: Rohan McLure <rmclure@linux.ibm.com> Date: Wed Sep 21 16:56:01 2022 +1000 powerpc: Provide syscall wrapper Implement syscall wrapper as per s390, x86, arm64. When enabled cause handlers to accept parameters from a stack frame rather than from user scratch register state. This allows for user registers to be safely cleared in order to reduce caller influence on speculation within syscall routine. The wrapper is a macro that emits syscall handler symbols that call into the target handler, obtaining its parameters from a struct pt_regs on the stack. As registers are already saved to the stack prior to calling system_call_exception, it appears that this function is executed more efficiently with the new stack-pointer convention than with parameters passed by registers, avoiding the allocation of a stack frame for this method. On a 32-bit system, we see >20% performance increases on the null_syscall microbenchmark, and on a Power 8 the performance gains amortise the cost of clearing and restoring registers which is implemented at the end of this series, seeing final result of ~5.6% performance improvement on null_syscall. Syscalls are wrapped in this fashion on all platforms except for the Cell processor as this commit does not provide SPU support. This can be quickly fixed in a successive patch, but requires spu_sys_callback to allocate a pt_regs structure to satisfy the wrapped calling convention. Co-developed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmai.com> [mpe: Make incompatible with COMPAT to retain clearing of high bits of args] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220921065605.1051927-22-rmclure@linux.ibm.com arch/powerpc/Kconfig | 1 + arch/powerpc/include/asm/syscall.h | 4 +++ arch/powerpc/include/asm/syscall_wrapper.h | 51 ++++++++++++++++++++++++++++++ arch/powerpc/include/asm/syscalls.h | 24 ++++++++++++-- arch/powerpc/kernel/syscall.c | 34 ++++++++++---------- arch/powerpc/kernel/systbl.c | 7 ++++ arch/powerpc/kernel/vdso.c | 2 ++ 7 files changed, 105 insertions(+), 18 deletions(-) create mode 100644 arch/powerpc/include/asm/syscall_wrapper.h
On Mon, 2022-10-10 at 16:26 -0600, Jason A. Donenfeld wrote: > > Bisected: > > 7e92e01b724526b98cbc7f03dd4afa0295780d56 is the first bad commit > commit 7e92e01b724526b98cbc7f03dd4afa0295780d56 > Author: Rohan McLure <rmclure@linux.ibm.com> > Date: Wed Sep 21 16:56:01 2022 +1000 > > powerpc: Provide syscall wrapper > > Implement syscall wrapper as per s390, x86, arm64. When enabled > cause handlers to accept parameters from a stack frame rather > than > from user scratch register state. This allows for user registers > to be > safely cleared in order to reduce caller influence on speculation > within syscall routine. The wrapper is a macro that emits syscall > handler symbols that call into the target handler, obtaining its > parameters from a struct pt_regs on the stack. > > As registers are already saved to the stack prior to calling > system_call_exception, it appears that this function is executed > more > efficiently with the new stack-pointer convention than with > parameters > passed by registers, avoiding the allocation of a stack frame for > this > method. On a 32-bit system, we see >20% performance increases on > the > null_syscall microbenchmark, and on a Power 8 the performance > gains > amortise the cost of clearing and restoring registers which is > implemented at the end of this series, seeing final result of > ~5.6% > performance improvement on null_syscall. > > Syscalls are wrapped in this fashion on all platforms except for > the > Cell processor as this commit does not provide SPU support. This > can be > quickly fixed in a successive patch, but requires > spu_sys_callback to > allocate a pt_regs structure to satisfy the wrapped calling > convention. > > Co-developed-by: Andrew Donnellan <ajd@linux.ibm.com> > Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> > Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> > Reviewed-by: Nicholas Piggin <npiggin@gmai.com> > [mpe: Make incompatible with COMPAT to retain clearing of high > bits of args] > Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> > Link: > https://lore.kernel.org/r/20220921065605.1051927-22-rmclure@linux.ibm.com Thanks for bisecting, this is interesting! Could you provide your .config and the environment you're running in? Your reproducer doesn't seem to trigger it on my baremetal POWER8 pseries_le_defconfig.
Hi Andrew, On Tue, Oct 11, 2022 at 11:00:15AM +1100, Andrew Donnellan wrote: > Thanks for bisecting, this is interesting! Could you provide your > .config and the environment you're running in? Your reproducer doesn't > seem to trigger it on my baremetal POWER8 pseries_le_defconfig. Sure. .config: https://xn--4db.cc/NemFt2Vs (change CONFIG_INITRAMFS_SOURCE) Toolchain: https://download.wireguard.com/qemu-test/toolchains/20211123/powerpc-linux-musl-cross.tgz You can also just run: ARCH=powerpc make -C tools/testing/selftests/wireguard/qemu -j$(nproc) And that'll assemble the whole thing. Jason
"Jason A. Donenfeld" <Jason@zx2c4.com> writes: > Hi Andrew, > > On Tue, Oct 11, 2022 at 11:00:15AM +1100, Andrew Donnellan wrote: >> Thanks for bisecting, this is interesting! Could you provide your >> .config and the environment you're running in? Your reproducer doesn't >> seem to trigger it on my baremetal POWER8 pseries_le_defconfig. > > Sure. > > .config: https://xn--4db.cc/NemFt2Vs (change CONFIG_INITRAMFS_SOURCE) > Toolchain: https://download.wireguard.com/qemu-test/toolchains/20211123/powerpc-linux-musl-cross.tgz > > You can also just run: > > ARCH=powerpc make -C tools/testing/selftests/wireguard/qemu -j$(nproc) > > And that'll assemble the whole thing. I tried that :) What host OS are you running that on? I get: mkdir -p /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc powerpc-linux-musl-gcc -o /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc/init -O3 -pipe -std=gnu11 init.c /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc/powerpc-linux-musl-cross/bin/../lib/gcc/powerpc-linux-musl/11.2.1/../../../../powerpc-linux-musl/bin/ld: cannot find Scrt1.o: No such file or directory /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc/powerpc-linux-musl-cross/bin/../lib/gcc/powerpc-linux-musl/11.2.1/../../../../powerpc-linux-musl/bin/ld: cannot find crti.o: No such file or directory /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc/powerpc-linux-musl-cross/bin/../lib/gcc/powerpc-linux-musl/11.2.1/../../../../powerpc-linux-musl/bin/ld: cannot find crtbeginS.o: No such file or directory /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc/powerpc-linux-musl-cross/bin/../lib/gcc/powerpc-linux-musl/11.2.1/../../../../powerpc-linux-musl/bin/ld: cannot find -lgcc /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc/powerpc-linux-musl-cross/bin/../lib/gcc/powerpc-linux-musl/11.2.1/../../../../powerpc-linux-musl/bin/ld: cannot find -lgcc collect2: error: ld returned 1 exit status cheers
"Jason A. Donenfeld" <Jason@zx2c4.com> writes: > On Mon, Oct 10, 2022 at 01:25:25PM -0600, Jason A. Donenfeld wrote: >> Hi Michael, >> >> On Sun, Oct 09, 2022 at 10:01:39PM +1100, Michael Ellerman wrote: >> > powerpc updates for 6.1 >> > >> > - Remove our now never-true definitions for pgd_huge() and p4d_leaf(). >> > >> > - Add pte_needs_flush() and huge_pmd_needs_flush() for 64-bit. >> > >> > - Add support for syscall wrappers. >> > >> > - Add support for KFENCE on 64-bit. >> > >> > - Update 64-bit HV KVM to use the new guest state entry/exit accounting API. >> > >> > - Support execute-only memory when using the Radix MMU (P9 or later). >> > >> > - Implement CONFIG_PARAVIRT_TIME_ACCOUNTING for pseries guests. >> > >> > - Updates to our linker script to move more data into read-only sections. >> > >> > - Allow the VDSO to be randomised on 32-bit. >> > >> > - Many other small features and fixes. >> >> FYI, something in here broke the wireguard test suite, which runs the >> iperf3 networking utility. The full log is here [1], but the relevant part >> is: >> >> [+] NS1: iperf3 -Z -t 3 -c 192.168.241.2 >> Connecting to host 192.168.241.2, port 5201 >> iperf3: error - failed to read /dev/urandom: Bad address >> >> I'll see if I can narrow it down a bit more and bisect. But just FYI, in >> case you have an intuition. > > Huh. From iov_iter.c: > > static int copyout(void __user *to, const void *from, size_t n) > { > size_t before = n; > if (should_fail_usercopy()) > return n; > if (access_ok(to, n)) { > instrument_copy_to_user(to, from, n); > n = raw_copy_to_user(to, from, n); > if (n == before) > pr_err("SARU n still %zu pointer is %lx\n", n, (unsigned long)to); > } > return n; > } > > I added the pr_err() there to catch the failure: > [ 3.443506] SARU n still 64 pointer is b78db000 > > Also I managed to extract the failing portion of iperf3 into something > smaller: > > int temp; > char *x; > ssize_t l; > FILE *f; > char template[] = "/blah-XXXXXX"; > > temp = mkstemp(template); > if (temp < 0) > panic("mkstemp"); > if (unlink(template) < 0) > panic("unlink"); > if (ftruncate(temp, 0x20000) < 0) > panic("ftruncate"); > x = mmap(NULL, 0x20000, PROT_READ|PROT_WRITE, MAP_PRIVATE, temp, 0); > if (x == MAP_FAILED) > panic("mmap"); > f = fopen("/dev/urandom", "rb"); > if (!f) > panic("fopen"); > setbuf(f, NULL); > if (fread(x, 1, 0x20000, f) != 0x20000) > panic("fread"); Does that fail for you reliably? It succeeds for me running under qemu ppce500, though I'm not using your kernel config yet. cheers
On Tue, Oct 11, 2022 at 12:44:20PM +1100, Michael Ellerman wrote: > "Jason A. Donenfeld" <Jason@zx2c4.com> writes: > > Hi Andrew, > > > > On Tue, Oct 11, 2022 at 11:00:15AM +1100, Andrew Donnellan wrote: > >> Thanks for bisecting, this is interesting! Could you provide your > >> .config and the environment you're running in? Your reproducer doesn't > >> seem to trigger it on my baremetal POWER8 pseries_le_defconfig. > > > > Sure. > > > > .config: https://xn--4db.cc/NemFt2Vs (change CONFIG_INITRAMFS_SOURCE) > > Toolchain: https://download.wireguard.com/qemu-test/toolchains/20211123/powerpc-linux-musl-cross.tgz > > > > You can also just run: > > > > ARCH=powerpc make -C tools/testing/selftests/wireguard/qemu -j$(nproc) > > > > And that'll assemble the whole thing. > > I tried that :) > > What host OS are you running that on? > > I get: > > mkdir -p /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc > powerpc-linux-musl-gcc -o /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc/init -O3 -pipe -std=gnu11 init.c > /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc/powerpc-linux-musl-cross/bin/../lib/gcc/powerpc-linux-musl/11.2.1/../../../../powerpc-linux-musl/bin/ld: cannot find Scrt1.o: No such file or directory > /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc/powerpc-linux-musl-cross/bin/../lib/gcc/powerpc-linux-musl/11.2.1/../../../../powerpc-linux-musl/bin/ld: cannot find crti.o: No such file or directory > /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc/powerpc-linux-musl-cross/bin/../lib/gcc/powerpc-linux-musl/11.2.1/../../../../powerpc-linux-musl/bin/ld: cannot find crtbeginS.o: No such file or directory > /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc/powerpc-linux-musl-cross/bin/../lib/gcc/powerpc-linux-musl/11.2.1/../../../../powerpc-linux-musl/bin/ld: cannot find -lgcc > /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc/powerpc-linux-musl-cross/bin/../lib/gcc/powerpc-linux-musl/11.2.1/../../../../powerpc-linux-musl/bin/ld: cannot find -lgcc > collect2: error: ld returned 1 exit status Here's what happened: - You started the thing and the kernel compile complained about an unclean tree. - You ran mrproper. - You tried to run the thing again. amirite? If so, what happened is that mrproper deleted the .o files from the toolchain. Solution: ARCH=powerpc make -C tools/testing/selftests/wireguard/qemu -j$(nproc) clean ARCH=powerpc make -C tools/testing/selftests/wireguard/qemu -j$(nproc) Let me know how that goes. Jason
On Tue, Oct 11, 2022 at 12:53:17PM +1100, Michael Ellerman wrote: > "Jason A. Donenfeld" <Jason@zx2c4.com> writes: > > On Mon, Oct 10, 2022 at 01:25:25PM -0600, Jason A. Donenfeld wrote: > >> Hi Michael, > >> > >> On Sun, Oct 09, 2022 at 10:01:39PM +1100, Michael Ellerman wrote: > >> > powerpc updates for 6.1 > >> > > >> > - Remove our now never-true definitions for pgd_huge() and p4d_leaf(). > >> > > >> > - Add pte_needs_flush() and huge_pmd_needs_flush() for 64-bit. > >> > > >> > - Add support for syscall wrappers. > >> > > >> > - Add support for KFENCE on 64-bit. > >> > > >> > - Update 64-bit HV KVM to use the new guest state entry/exit accounting API. > >> > > >> > - Support execute-only memory when using the Radix MMU (P9 or later). > >> > > >> > - Implement CONFIG_PARAVIRT_TIME_ACCOUNTING for pseries guests. > >> > > >> > - Updates to our linker script to move more data into read-only sections. > >> > > >> > - Allow the VDSO to be randomised on 32-bit. > >> > > >> > - Many other small features and fixes. > >> > >> FYI, something in here broke the wireguard test suite, which runs the > >> iperf3 networking utility. The full log is here [1], but the relevant part > >> is: > >> > >> [+] NS1: iperf3 -Z -t 3 -c 192.168.241.2 > >> Connecting to host 192.168.241.2, port 5201 > >> iperf3: error - failed to read /dev/urandom: Bad address > >> > >> I'll see if I can narrow it down a bit more and bisect. But just FYI, in > >> case you have an intuition. > > > > Huh. From iov_iter.c: > > > > static int copyout(void __user *to, const void *from, size_t n) > > { > > size_t before = n; > > if (should_fail_usercopy()) > > return n; > > if (access_ok(to, n)) { > > instrument_copy_to_user(to, from, n); > > n = raw_copy_to_user(to, from, n); > > if (n == before) > > pr_err("SARU n still %zu pointer is %lx\n", n, (unsigned long)to); > > } > > return n; > > } > > > > I added the pr_err() there to catch the failure: > > [ 3.443506] SARU n still 64 pointer is b78db000 > > > > Also I managed to extract the failing portion of iperf3 into something > > smaller: > > > > int temp; > > char *x; > > ssize_t l; > > FILE *f; > > char template[] = "/blah-XXXXXX"; > > > > temp = mkstemp(template); > > if (temp < 0) > > panic("mkstemp"); > > if (unlink(template) < 0) > > panic("unlink"); > > if (ftruncate(temp, 0x20000) < 0) > > panic("ftruncate"); > > x = mmap(NULL, 0x20000, PROT_READ|PROT_WRITE, MAP_PRIVATE, temp, 0); > > if (x == MAP_FAILED) > > panic("mmap"); > > f = fopen("/dev/urandom", "rb"); > > if (!f) > > panic("fopen"); > > setbuf(f, NULL); > > if (fread(x, 1, 0x20000, f) != 0x20000) > > panic("fread"); > > Does that fail for you reliably? > > It succeeds for me running under qemu ppce500, though I'm not using your > kernel config yet. Yes, every time without fail, across two systems and two qemu builds. Jason
"Jason A. Donenfeld" <Jason@zx2c4.com> writes: > On Tue, Oct 11, 2022 at 12:44:20PM +1100, Michael Ellerman wrote: >> "Jason A. Donenfeld" <Jason@zx2c4.com> writes: >> > Hi Andrew, >> > >> > On Tue, Oct 11, 2022 at 11:00:15AM +1100, Andrew Donnellan wrote: >> >> Thanks for bisecting, this is interesting! Could you provide your >> >> .config and the environment you're running in? Your reproducer doesn't >> >> seem to trigger it on my baremetal POWER8 pseries_le_defconfig. >> > >> > Sure. >> > >> > .config: https://xn--4db.cc/NemFt2Vs (change CONFIG_INITRAMFS_SOURCE) >> > Toolchain: https://download.wireguard.com/qemu-test/toolchains/20211123/powerpc-linux-musl-cross.tgz >> > >> > You can also just run: >> > >> > ARCH=powerpc make -C tools/testing/selftests/wireguard/qemu -j$(nproc) >> > >> > And that'll assemble the whole thing. >> >> I tried that :) >> >> What host OS are you running that on? >> >> I get: >> >> mkdir -p /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc >> powerpc-linux-musl-gcc -o /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc/init -O3 -pipe -std=gnu11 init.c >> /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc/powerpc-linux-musl-cross/bin/../lib/gcc/powerpc-linux-musl/11.2.1/../../../../powerpc-linux-musl/bin/ld: cannot find Scrt1.o: No such file or directory >> /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc/powerpc-linux-musl-cross/bin/../lib/gcc/powerpc-linux-musl/11.2.1/../../../../powerpc-linux-musl/bin/ld: cannot find crti.o: No such file or directory >> /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc/powerpc-linux-musl-cross/bin/../lib/gcc/powerpc-linux-musl/11.2.1/../../../../powerpc-linux-musl/bin/ld: cannot find crtbeginS.o: No such file or directory >> /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc/powerpc-linux-musl-cross/bin/../lib/gcc/powerpc-linux-musl/11.2.1/../../../../powerpc-linux-musl/bin/ld: cannot find -lgcc >> /scratch/michael/linus/tools/testing/selftests/wireguard/qemu/build/powerpc/powerpc-linux-musl-cross/bin/../lib/gcc/powerpc-linux-musl/11.2.1/../../../../powerpc-linux-musl/bin/ld: cannot find -lgcc >> collect2: error: ld returned 1 exit status > > Here's what happened: > > - You started the thing and the kernel compile complained about an > unclean tree. > - You ran mrproper. > - You tried to run the thing again. > > amirite? I think so yeah. I tried it on 3 different machines so I'm not sure exactly what I did where, but I definitely ran mrproper on one of them. > If so, what happened is that mrproper deleted the .o files from the > toolchain. Solution: > > ARCH=powerpc make -C tools/testing/selftests/wireguard/qemu -j$(nproc) clean > ARCH=powerpc make -C tools/testing/selftests/wireguard/qemu -j$(nproc) > > Let me know how that goes. Yep that works thanks. And I see the iperf failure. Though I still can't see what the bug is, but hopefully if I stare at it longer I'll work it out. cheers
"Jason A. Donenfeld" <Jason@zx2c4.com> writes: > On Tue, Oct 11, 2022 at 12:53:17PM +1100, Michael Ellerman wrote: >> "Jason A. Donenfeld" <Jason@zx2c4.com> writes: >> > On Mon, Oct 10, 2022 at 01:25:25PM -0600, Jason A. Donenfeld wrote: >> >> Hi Michael, >> >> >> >> On Sun, Oct 09, 2022 at 10:01:39PM +1100, Michael Ellerman wrote: >> >> > powerpc updates for 6.1 >> >> > >> >> > - Remove our now never-true definitions for pgd_huge() and p4d_leaf(). >> >> > >> >> > - Add pte_needs_flush() and huge_pmd_needs_flush() for 64-bit. >> >> > >> >> > - Add support for syscall wrappers. >> >> > >> >> > - Add support for KFENCE on 64-bit. >> >> > >> >> > - Update 64-bit HV KVM to use the new guest state entry/exit accounting API. >> >> > >> >> > - Support execute-only memory when using the Radix MMU (P9 or later). >> >> > >> >> > - Implement CONFIG_PARAVIRT_TIME_ACCOUNTING for pseries guests. >> >> > >> >> > - Updates to our linker script to move more data into read-only sections. >> >> > >> >> > - Allow the VDSO to be randomised on 32-bit. >> >> > >> >> > - Many other small features and fixes. >> >> >> >> FYI, something in here broke the wireguard test suite, which runs the >> >> iperf3 networking utility. The full log is here [1], but the relevant part >> >> is: >> >> >> >> [+] NS1: iperf3 -Z -t 3 -c 192.168.241.2 >> >> Connecting to host 192.168.241.2, port 5201 >> >> iperf3: error - failed to read /dev/urandom: Bad address >> >> >> >> I'll see if I can narrow it down a bit more and bisect. But just FYI, in >> >> case you have an intuition. >> > >> > Huh. From iov_iter.c: >> > >> > static int copyout(void __user *to, const void *from, size_t n) >> > { >> > size_t before = n; >> > if (should_fail_usercopy()) >> > return n; >> > if (access_ok(to, n)) { >> > instrument_copy_to_user(to, from, n); >> > n = raw_copy_to_user(to, from, n); >> > if (n == before) >> > pr_err("SARU n still %zu pointer is %lx\n", n, (unsigned long)to); >> > } >> > return n; >> > } >> > >> > I added the pr_err() there to catch the failure: >> > [ 3.443506] SARU n still 64 pointer is b78db000 >> > >> > Also I managed to extract the failing portion of iperf3 into something >> > smaller: >> > >> > int temp; >> > char *x; >> > ssize_t l; >> > FILE *f; >> > char template[] = "/blah-XXXXXX"; >> > >> > temp = mkstemp(template); >> > if (temp < 0) >> > panic("mkstemp"); >> > if (unlink(template) < 0) >> > panic("unlink"); >> > if (ftruncate(temp, 0x20000) < 0) >> > panic("ftruncate"); >> > x = mmap(NULL, 0x20000, PROT_READ|PROT_WRITE, MAP_PRIVATE, temp, 0); >> > if (x == MAP_FAILED) >> > panic("mmap"); >> > f = fopen("/dev/urandom", "rb"); >> > if (!f) >> > panic("fopen"); >> > setbuf(f, NULL); >> > if (fread(x, 1, 0x20000, f) != 0x20000) >> > panic("fread"); >> >> Does that fail for you reliably? >> >> It succeeds for me running under qemu ppce500, though I'm not using your >> kernel config yet. > > Yes, every time without fail, across two systems and two qemu builds. OK. Joel worked out that it only fails when built with musl, so that's why it's succeeding for me (built with glibc). cheers
On Tue Oct 11, 2022 at 7:35 PM AEST, Michael Ellerman wrote: > "Jason A. Donenfeld" <Jason@zx2c4.com> writes: > > On Tue, Oct 11, 2022 at 12:53:17PM +1100, Michael Ellerman wrote: > >> "Jason A. Donenfeld" <Jason@zx2c4.com> writes: > >> > On Mon, Oct 10, 2022 at 01:25:25PM -0600, Jason A. Donenfeld wrote: > >> >> Hi Michael, > >> >> > >> >> On Sun, Oct 09, 2022 at 10:01:39PM +1100, Michael Ellerman wrote: > >> >> > powerpc updates for 6.1 > >> >> > > >> >> > - Remove our now never-true definitions for pgd_huge() and p4d_leaf(). > >> >> > > >> >> > - Add pte_needs_flush() and huge_pmd_needs_flush() for 64-bit. > >> >> > > >> >> > - Add support for syscall wrappers. > >> >> > > >> >> > - Add support for KFENCE on 64-bit. > >> >> > > >> >> > - Update 64-bit HV KVM to use the new guest state entry/exit accounting API. > >> >> > > >> >> > - Support execute-only memory when using the Radix MMU (P9 or later). > >> >> > > >> >> > - Implement CONFIG_PARAVIRT_TIME_ACCOUNTING for pseries guests. > >> >> > > >> >> > - Updates to our linker script to move more data into read-only sections. > >> >> > > >> >> > - Allow the VDSO to be randomised on 32-bit. > >> >> > > >> >> > - Many other small features and fixes. > >> >> > >> >> FYI, something in here broke the wireguard test suite, which runs the > >> >> iperf3 networking utility. The full log is here [1], but the relevant part > >> >> is: > >> >> > >> >> [+] NS1: iperf3 -Z -t 3 -c 192.168.241.2 > >> >> Connecting to host 192.168.241.2, port 5201 > >> >> iperf3: error - failed to read /dev/urandom: Bad address > >> >> > >> >> I'll see if I can narrow it down a bit more and bisect. But just FYI, in > >> >> case you have an intuition. > >> > > >> > Huh. From iov_iter.c: > >> > > >> > static int copyout(void __user *to, const void *from, size_t n) > >> > { > >> > size_t before = n; > >> > if (should_fail_usercopy()) > >> > return n; > >> > if (access_ok(to, n)) { > >> > instrument_copy_to_user(to, from, n); > >> > n = raw_copy_to_user(to, from, n); > >> > if (n == before) > >> > pr_err("SARU n still %zu pointer is %lx\n", n, (unsigned long)to); > >> > } > >> > return n; > >> > } > >> > > >> > I added the pr_err() there to catch the failure: > >> > [ 3.443506] SARU n still 64 pointer is b78db000 > >> > > >> > Also I managed to extract the failing portion of iperf3 into something > >> > smaller: > >> > > >> > int temp; > >> > char *x; > >> > ssize_t l; > >> > FILE *f; > >> > char template[] = "/blah-XXXXXX"; > >> > > >> > temp = mkstemp(template); > >> > if (temp < 0) > >> > panic("mkstemp"); > >> > if (unlink(template) < 0) > >> > panic("unlink"); > >> > if (ftruncate(temp, 0x20000) < 0) > >> > panic("ftruncate"); > >> > x = mmap(NULL, 0x20000, PROT_READ|PROT_WRITE, MAP_PRIVATE, temp, 0); > >> > if (x == MAP_FAILED) > >> > panic("mmap"); > >> > f = fopen("/dev/urandom", "rb"); > >> > if (!f) > >> > panic("fopen"); > >> > setbuf(f, NULL); > >> > if (fread(x, 1, 0x20000, f) != 0x20000) > >> > panic("fread"); > >> > >> Does that fail for you reliably? > >> > >> It succeeds for me running under qemu ppce500, though I'm not using your > >> kernel config yet. > > > > Yes, every time without fail, across two systems and two qemu builds. > > OK. Joel worked out that it only fails when built with musl, so that's > why it's succeeding for me (built with glibc). This was independently discovered by several, but we worked out it's because musl uses ftruncate64 here, while glibc doesn't seem to. And ftruncate64 got broken by the syscall wrappers patch on ppc32. The kernel is seeing a 0 length ftruncate call, so the user access sigbuses and can't copy anything. This quick hack gets the test program working again. Only very lightly tested so far... Thanks, Nick --- diff --git a/arch/powerpc/include/asm/syscalls.h b/arch/powerpc/include/asm/syscalls.h index 9840d572da55..9578cc5e4f84 100644 --- a/arch/powerpc/include/asm/syscalls.h +++ b/arch/powerpc/include/asm/syscalls.h @@ -89,6 +89,27 @@ long compat_sys_rt_sigreturn(void); * responsible for combining parameter pairs. */ +#ifdef CONFIG_PPC32 +long sys_ppc_pread64(unsigned int fd, + char __user *ubuf, compat_size_t count, + u32 reg6, u32 pos1, u32 pos2); +long sys_ppc_pwrite64(unsigned int fd, + const char __user *ubuf, compat_size_t count, + u32 reg6, u32 pos1, u32 pos2); +long sys_ppc_readahead(int fd, u32 r4, + u32 offset1, u32 offset2, u32 count); +long sys_ppc_truncate64(const char __user *path, u32 reg4, + unsigned long len1, unsigned long len2); +long sys_ppc_ftruncate64(unsigned int fd, u32 reg4, + unsigned long len1, unsigned long len2); +long sys_ppc32_fadvise64(int fd, u32 unused, u32 offset1, u32 offset2, + size_t len, int advice); +long sys_ppc_sync_file_range2(int fd, unsigned int flags, + unsigned int offset1, + unsigned int offset2, + unsigned int nbytes1, + unsigned int nbytes2); +#endif #ifdef CONFIG_COMPAT long compat_sys_mmap2(unsigned long addr, size_t len, unsigned long prot, unsigned long flags, diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index 1f121c188805..d382564034a7 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -73,6 +73,7 @@ obj-y := cputable.o syscalls.o \ obj-y += ptrace/ obj-$(CONFIG_PPC64) += setup_64.o irq_64.o\ paca.o nvram_64.o note.o +obj-$(CONFIG_PPC32) += sys_ppc32.o obj-$(CONFIG_COMPAT) += sys_ppc32.o signal_32.o obj-$(CONFIG_VDSO32) += vdso32_wrapper.o obj-$(CONFIG_PPC_WATCHDOG) += watchdog.o diff --git a/arch/powerpc/kernel/sys_ppc32.c b/arch/powerpc/kernel/sys_ppc32.c index dcc3c9fd4cfd..f9ce13e6c5eb 100644 --- a/arch/powerpc/kernel/sys_ppc32.c +++ b/arch/powerpc/kernel/sys_ppc32.c @@ -47,7 +47,17 @@ #include <asm/syscalls.h> #include <asm/switch_to.h> -COMPAT_SYSCALL_DEFINE6(ppc_pread64, +#ifdef CONFIG_PPC32 +#define PPC32_SYSCALL_DEFINE4 SYSCALL_DEFINE4 +#define PPC32_SYSCALL_DEFINE5 SYSCALL_DEFINE5 +#define PPC32_SYSCALL_DEFINE6 SYSCALL_DEFINE6 +#else +#define PPC32_SYSCALL_DEFINE4 COMPAT_SYSCALL_DEFINE4 +#define PPC32_SYSCALL_DEFINE5 COMPAT_SYSCALL_DEFINE5 +#define PPC32_SYSCALL_DEFINE6 COMPAT_SYSCALL_DEFINE6 +#endif + +PPC32_SYSCALL_DEFINE6(ppc_pread64, unsigned int, fd, char __user *, ubuf, compat_size_t, count, u32, reg6, u32, pos1, u32, pos2) @@ -55,7 +65,7 @@ COMPAT_SYSCALL_DEFINE6(ppc_pread64, return ksys_pread64(fd, ubuf, count, merge_64(pos1, pos2)); } -COMPAT_SYSCALL_DEFINE6(ppc_pwrite64, +PPC32_SYSCALL_DEFINE6(ppc_pwrite64, unsigned int, fd, const char __user *, ubuf, compat_size_t, count, u32, reg6, u32, pos1, u32, pos2) @@ -63,28 +73,29 @@ COMPAT_SYSCALL_DEFINE6(ppc_pwrite64, return ksys_pwrite64(fd, ubuf, count, merge_64(pos1, pos2)); } -COMPAT_SYSCALL_DEFINE5(ppc_readahead, +PPC32_SYSCALL_DEFINE5(ppc_readahead, int, fd, u32, r4, u32, offset1, u32, offset2, u32, count) { return ksys_readahead(fd, merge_64(offset1, offset2), count); } -COMPAT_SYSCALL_DEFINE4(ppc_truncate64, +PPC32_SYSCALL_DEFINE4(ppc_truncate64, const char __user *, path, u32, reg4, unsigned long, len1, unsigned long, len2) { return ksys_truncate(path, merge_64(len1, len2)); } -COMPAT_SYSCALL_DEFINE4(ppc_ftruncate64, +PPC32_SYSCALL_DEFINE4(ppc_ftruncate64, unsigned int, fd, u32, reg4, unsigned long, len1, unsigned long, len2) { + printk("ppc_ftruncate64 len1=%lx len2=%lx\n", len1, len2); return ksys_ftruncate(fd, merge_64(len1, len2)); } -COMPAT_SYSCALL_DEFINE6(ppc32_fadvise64, +PPC32_SYSCALL_DEFINE6(ppc32_fadvise64, int, fd, u32, unused, u32, offset1, u32, offset2, size_t, len, int, advice) { @@ -92,7 +103,7 @@ COMPAT_SYSCALL_DEFINE6(ppc32_fadvise64, advice); } -COMPAT_SYSCALL_DEFINE6(ppc_sync_file_range2, +PPC32_SYSCALL_DEFINE6(ppc_sync_file_range2, int, fd, unsigned int, flags, unsigned int, offset1, unsigned int, offset2, unsigned int, nbytes1, unsigned int, nbytes2) diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl index 2bca64f96164..71d6e38e5a3a 100644 --- a/arch/powerpc/kernel/syscalls/syscall.tbl +++ b/arch/powerpc/kernel/syscalls/syscall.tbl @@ -228,8 +228,8 @@ 176 64 rt_sigtimedwait sys_rt_sigtimedwait 177 nospu rt_sigqueueinfo sys_rt_sigqueueinfo compat_sys_rt_sigqueueinfo 178 nospu rt_sigsuspend sys_rt_sigsuspend compat_sys_rt_sigsuspend -179 common pread64 sys_pread64 compat_sys_ppc_pread64 -180 common pwrite64 sys_pwrite64 compat_sys_ppc_pwrite64 +179 common pread64 sys_ppc_pread64 compat_sys_ppc_pread64 +180 common pwrite64 sys_ppc_pwrite64 compat_sys_ppc_pwrite64 181 common chown sys_chown 182 common getcwd sys_getcwd 183 common capget sys_capget @@ -242,10 +242,10 @@ 188 common putpmsg sys_ni_syscall 189 nospu vfork sys_vfork 190 common ugetrlimit sys_getrlimit compat_sys_getrlimit -191 common readahead sys_readahead compat_sys_ppc_readahead +191 common readahead sys_ppc_readahead compat_sys_ppc_readahead 192 32 mmap2 sys_mmap2 compat_sys_mmap2 -193 32 truncate64 sys_truncate64 compat_sys_ppc_truncate64 -194 32 ftruncate64 sys_ftruncate64 compat_sys_ppc_ftruncate64 +193 32 truncate64 sys_ppc_truncate64 compat_sys_ppc_truncate64 +194 32 ftruncate64 sys_ppc_ftruncate64 compat_sys_ppc_ftruncate64 195 32 stat64 sys_stat64 196 32 lstat64 sys_lstat64 197 32 fstat64 sys_fstat64 @@ -288,7 +288,7 @@ 230 common io_submit sys_io_submit compat_sys_io_submit 231 common io_cancel sys_io_cancel 232 nospu set_tid_address sys_set_tid_address -233 common fadvise64 sys_fadvise64 compat_sys_ppc32_fadvise64 +233 common fadvise64 sys_ppc32_fadvise64 compat_sys_ppc32_fadvise64 234 nospu exit_group sys_exit_group 235 nospu lookup_dcookie sys_lookup_dcookie compat_sys_lookup_dcookie 236 common epoll_create sys_epoll_create @@ -390,7 +390,7 @@ 305 common signalfd sys_signalfd compat_sys_signalfd 306 common timerfd_create sys_timerfd_create 307 common eventfd sys_eventfd -308 common sync_file_range2 sys_sync_file_range2 compat_sys_ppc_sync_file_range2 +308 common sync_file_range2 sys_ppc_sync_file_range2 compat_sys_ppc_sync_file_range2 309 nospu fallocate sys_fallocate compat_sys_fallocate 310 nospu subpage_prot sys_subpage_prot 311 32 timerfd_settime sys_timerfd_settime32
On Sun, Oct 09, 2022 at 10:01:39PM +1100, Michael Ellerman wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA256 > > Hi Linus, > > Please pull powerpc updates for 6.1. > > No conflicts with your tree. There will be a conflict when you merge the kbuild tree, due > to us renaming head_fsl_booke.S to head_85xx.S. The resolution is mostly trivial, > linux-next has the correct result if it's unclear. > Post-merge problems are much more exciting when trying to run mac99 emulations in qemu. Enabling KFENCE results in log messages such as ================================ WARNING: inconsistent lock state 6.0.0-rc2-00163-ga5edf9815dd7 #1 Tainted: G N -------------------------------- inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. swapper/0/1 [HC0[0]:SC0[0]:HE1:SE1] takes: c000000002734d68 (native_tlbie_lock){+.?.}-{2:2}, at: .native_hpte_updateboltedpp+0x1a4/0x600 {IN-SOFTIRQ-W} state was registered at: .lock_acquire+0x20c/0x520 ._raw_spin_lock+0x4c/0x70 .native_hpte_invalidate+0x62c/0x840 .hash__kernel_map_pages+0x450/0x640 .kfence_protect+0x58/0xc0 .kfence_guarded_free+0x374/0x5a0 .__slab_free+0x340/0x670 .__d_free+0x2c/0x50 .rcu_core+0x3f4/0x1750 .__do_softirq+0x1dc/0x7dc .do_softirq_own_stack+0x40/0x60 0xc00000000775bca0 .irq_exit+0x1e8/0x220 .timer_interrupt+0x284/0x700 decrementer_common_virt+0x208/0x210 irq event stamp: 243607 hardirqs last enabled at (243607): [<c0000000003bd424>] .__slab_free+0x324/0x670 hardirqs last disabled at (243606): [<c0000000003bd2f4>] .__slab_free+0x1f4/0x670 softirqs last enabled at (242982): [<c0000000010a387c>] .__do_softirq+0x7ac/0x7dc softirqs last disabled at (242973): [<c000000000014b20>] .do_softirq_own_stack+0x40/0x60 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(native_tlbie_lock); <Interrupt> lock(native_tlbie_lock); *** DEADLOCK *** and, indeed, there appear to be various deadlocks. I had to disable KFENCE to be able to test further (or maybe KFENCE works and points out the soft lockup problem observed below - hard for me to determine). > powerpc/pseries: Move dtl scanning and steal time accounting to pseries platform With this patch, CONFIG_DTL must be enabled if CONFIG_PPC_SPLPAR is enabled. CONFIG_PPC_SPLPAR=y and CONFIG_DTL=n results in build failures due to irq.c:(.text+0x2798): undefined reference to `.pseries_accumulate_stolen_time' and many similar errors. I had to enable CONFIG_DTL explicitly to be able to build my test images. CONFIG_PPC_SPLPAR now depends on or requires CONFIG_DTL which in turn depends on CONFIG_DEBUG_FS. That seems odd. With all this worked around, I still get soft lockup problems when trying to boot from SDHCI. I have not been able to bisect this problem. BUG: soft lockup - CPU#0 stuck for 23s! [dd:111] Modules linked in: CPU: 0 PID: 111 Comm: dd Not tainted 6.0.0-10822-g60bb8154d1d7 #1 Hardware name: PowerMac3,1 PPC970FX 0x3c0301 PowerMac NIP: c000000000031630 LR: c000000000031964 CTR: 0000000000000000 REGS: c000000007df36a8 TRAP: 0900 Not tainted (6.0.0-10822-g60bb8154d1d7) MSR: 800000000000b032 <SF,EE,FP,ME,IR,DR,RI> CR: 28002228 XER: 00000000 IRQMASK: 0 GPR00: c000000000031964 c000000007df3870 c0000000013e5500 c000000007df36a8 GPR04: c00000000125dd80 0000000000000000 c000000007df3814 c00000000291d018 GPR08: c000000002d4bbb8 0000000000000000 c000000007365100 c000000002d21098 GPR12: 0000000028002222 c000000002e20000 00000000100d32e0 00000000100d32b4 GPR16: 00000000100d3301 00000000100d32b9 00000000100d3358 00000000100d32bf GPR20: 0000000000002000 00000000100d3372 00000000100d331e c000000007365918 GPR24: 0000000000000000 0000000000000e60 0000000000000900 0000000000000500 GPR28: 0000000000000a00 0000000000000f00 0000000000000002 0000000000000003 NIP [c000000000031630] .replay_soft_interrupts+0x60/0x300 LR [c000000000031964] .arch_local_irq_restore+0x94/0x1c0 Call Trace: [c000000007df3870] [c000000000031964] .arch_local_irq_restore+0x94/0x1c0 (unreliable) [c000000007df38f0] [c000000000f8a444] .__schedule+0x664/0xa50 [c000000007df39d0] [c000000000f8a8b0] .schedule+0x80/0x140 [c000000007df3a50] [c00000000092f0dc] .try_to_generate_entropy+0x118/0x174 [c000000007df3b40] [c00000000092e2e4] .urandom_read_iter+0x74/0x140 [c000000007df3bc0] [c0000000003b0044] .vfs_read+0x284/0x2d0 [c000000007df3cd0] [c0000000003b0d2c] .ksys_read+0xdc/0x130 [c000000007df3d80] [c00000000002a88c] .system_call_exception+0x19c/0x330 [c000000007df3e10] [c00000000000c1d4] system_call_common+0xf4/0x258 --- interrupt: c00 at 0x7fff829fd49c NIP: 00007fff829fd49c LR: 000000001000da90 CTR: 0000000000000000 REGS: c000000007df3e80 TRAP: 0c00 Not tainted (6.0.0-10822-g60bb8154d1d7) MSR: 800000000000f032 <SF,EE,PR,FP,ME,IR,DR,RI> CR: 22002422 XER: 00000000 IRQMASK: 0 GPR00: 0000000000000003 00007ffff138df70 00007fff82af7300 0000000000000000 GPR04: 00000000101102a0 0000000000000020 0000000000000000 0000000000000000 GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR12: 0000000000000000 00007fff82bcaac0 00000000100d32e0 00000000100d32b4 GPR16: 00000000100d3301 00000000100d32b9 00000000100d3358 00000000100d32bf GPR20: 0000000000002000 00000000100d3372 00000000100d331e 0000000000000000 GPR24: 7fffffffffffffff 00000000100b3a9c 00000000101102a0 0000000000000020 GPR28: 00000000101025c0 0000000000000020 0000000000000000 0000000000000000 NIP [00007fff829fd49c] 0x7fff829fd49c LR [000000001000da90] 0x1000da90 --- interrupt: c00 Instruction dump: 3b600500 3b800a00 3ba00f00 f8010010 f821fdc1 60000000 60000000 38610078 e92d0af8 f92101f8 39200000 48039745 <60000000> 39000000 e9410180 892d0933 Kernel panic - not syncing: softlockup: hung tasks CPU: 0 PID: 111 Comm: dd Tainted: G L 6.0.0-10822-g60bb8154d1d7 #1 Hardware name: PowerMac3,1 PPC970FX 0x3c0301 PowerMac Call Trace: [c000000007df3180] [c000000000f584c0] .dump_stack_lvl+0x7c/0xc4 (unreliable) [c000000007df3210] [c0000000000d5b58] .panic+0x174/0x42c [c000000007df32c0] [c000000000231054] .watchdog_timer_fn+0x3a4/0x3e0 [c000000007df3380] [c0000000001cc65c] .__hrtimer_run_queues+0x1fc/0x650 [c000000007df3490] [c0000000001cd5cc] .hrtimer_interrupt+0x11c/0x320 [c000000007df3550] [c000000000021f5c] .timer_interrupt+0x1cc/0x600 [c000000007df3630] [c0000000000316c8] .replay_soft_interrupts+0xf8/0x300 [c000000007df3870] [c000000000031964] .arch_local_irq_restore+0x94/0x1c0 [c000000007df38f0] [c000000000f8a444] .__schedule+0x664/0xa50 [c000000007df39d0] [c000000000f8a8b0] .schedule+0x80/0x140 [c000000007df3a50] [c00000000092f0dc] .try_to_generate_entropy+0x118/0x174 [c000000007df3b40] [c00000000092e2e4] .urandom_read_iter+0x74/0x140 [c000000007df3bc0] [c0000000003b0044] .vfs_read+0x284/0x2d0 [c000000007df3cd0] [c0000000003b0d2c] .ksys_read+0xdc/0x130 [c000000007df3d80] [c00000000002a88c] .system_call_exception+0x19c/0x330 [c000000007df3e10] [c00000000000c1d4] system_call_common+0xf4/0x258 --- interrupt: c00 at 0x7fff829fd49c NIP: 00007fff829fd49c LR: 000000001000da90 CTR: 0000000000000000 REGS: c000000007df3e80 TRAP: 0c00 Tainted: G L (6.0.0-10822-g60bb8154d1d7) MSR: 800000000000f032 <SF,EE,PR,FP,ME,IR,DR,RI> CR: 22002422 XER: 00000000 IRQMASK: 0 GPR00: 0000000000000003 00007ffff138df70 00007fff82af7300 0000000000000000 GPR04: 00000000101102a0 0000000000000020 0000000000000000 0000000000000000 GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR12: 0000000000000000 00007fff82bcaac0 00000000100d32e0 00000000100d32b4 GPR16: 00000000100d3301 00000000100d32b9 00000000100d3358 00000000100d32bf GPR20: 0000000000002000 00000000100d3372 00000000100d331e 0000000000000000 GPR24: 7fffffffffffffff 00000000100b3a9c 00000000101102a0 0000000000000020 GPR28: 00000000101025c0 0000000000000020 0000000000000000 0000000000000000 NIP [00007fff829fd49c] 0x7fff829fd49c LR [000000001000da90] 0x1000da90 --- interrupt: c00 Guenter
On Wed, Oct 12, 2022 at 07:18:27AM -0700, Guenter Roeck wrote: > NIP [c000000000031630] .replay_soft_interrupts+0x60/0x300 > LR [c000000000031964] .arch_local_irq_restore+0x94/0x1c0 > Call Trace: > [c000000007df3870] [c000000000031964] .arch_local_irq_restore+0x94/0x1c0 (unreliable) > [c000000007df38f0] [c000000000f8a444] .__schedule+0x664/0xa50 > [c000000007df39d0] [c000000000f8a8b0] .schedule+0x80/0x140 > [c000000007df3a50] [c00000000092f0dc] .try_to_generate_entropy+0x118/0x174 > [c000000007df3b40] [c00000000092e2e4] .urandom_read_iter+0x74/0x140 > [c000000007df3bc0] [c0000000003b0044] .vfs_read+0x284/0x2d0 > [c000000007df3cd0] [c0000000003b0d2c] .ksys_read+0xdc/0x130 > [c000000007df3d80] [c00000000002a88c] .system_call_exception+0x19c/0x330 > [c000000007df3e10] [c00000000000c1d4] system_call_common+0xf4/0x258 Obviously the first couple lines of this concern me a bit. But I think actually this might just be a catalyst for another bug. You could view that function as basically just: while (something) schedule(); And I guess in the process of calling the scheduler a lot, which toggles interrupts a lot, something got wedged. Curious, though, I did try to reproduce this, to no avail. My .config is https://xn--4db.cc/rBvHWfDZ . What's yours? Jason
On Wed, Oct 12, 2022 at 09:49:26AM -0600, Jason A. Donenfeld wrote: > On Wed, Oct 12, 2022 at 07:18:27AM -0700, Guenter Roeck wrote: > > NIP [c000000000031630] .replay_soft_interrupts+0x60/0x300 > > LR [c000000000031964] .arch_local_irq_restore+0x94/0x1c0 > > Call Trace: > > [c000000007df3870] [c000000000031964] .arch_local_irq_restore+0x94/0x1c0 (unreliable) > > [c000000007df38f0] [c000000000f8a444] .__schedule+0x664/0xa50 > > [c000000007df39d0] [c000000000f8a8b0] .schedule+0x80/0x140 > > [c000000007df3a50] [c00000000092f0dc] .try_to_generate_entropy+0x118/0x174 > > [c000000007df3b40] [c00000000092e2e4] .urandom_read_iter+0x74/0x140 > > [c000000007df3bc0] [c0000000003b0044] .vfs_read+0x284/0x2d0 > > [c000000007df3cd0] [c0000000003b0d2c] .ksys_read+0xdc/0x130 > > [c000000007df3d80] [c00000000002a88c] .system_call_exception+0x19c/0x330 > > [c000000007df3e10] [c00000000000c1d4] system_call_common+0xf4/0x258 > > Obviously the first couple lines of this concern me a bit. But I think > actually this might just be a catalyst for another bug. You could view > that function as basically just: > > while (something) > schedule(); > > And I guess in the process of calling the scheduler a lot, which toggles > interrupts a lot, something got wedged. > > Curious, though, I did try to reproduce this, to no avail. My .config is > https://xn--4db.cc/rBvHWfDZ . What's yours? > Attached. My qemu command line is qemu-system-ppc64 -M mac99 -cpu ppc64 \ -m 1G -kernel vmlinux -snapshot -device e1000,netdev=net0 \ -netdev user,id=net0 -device sdhci-pci -device sd-card,drive=d0 \ -drive file=/var/cache/buildbot/ppc64/rootfs.ext2,format=raw,if=none,id=d0 \ -nographic -vga none -monitor null -no-reboot \ --append "root=/dev/mmcblk0 rootwait console=tty console=ttyS0" Qemu version is 7.0. The root file system is from https://github.com/groeck/linux-build-test/tree/master/rootfs/ppc64 I used to have self tests enabled, but with that (specifically, with CONFIG_STRING_SELFTEST=y) I now get a different hang, so I disabled that for the time being. Guenter CONFIG_SYSVIPC=y CONFIG_POSIX_MQUEUE=y CONFIG_HIGH_RES_TIMERS=y CONFIG_PREEMPT=y CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y CONFIG_BSD_PROCESS_ACCT=y CONFIG_BSD_PROCESS_ACCT_V3=y CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y CONFIG_CGROUPS=y CONFIG_MEMCG=y CONFIG_BLK_CGROUP=y CONFIG_CGROUP_SCHED=y CONFIG_RT_GROUP_SCHED=y CONFIG_CGROUP_FREEZER=y CONFIG_CPUSETS=y CONFIG_CGROUP_DEVICE=y CONFIG_CGROUP_CPUACCT=y CONFIG_CGROUP_DEBUG=y CONFIG_NAMESPACES=y CONFIG_BLK_DEV_INITRD=y CONFIG_EMBEDDED=y CONFIG_PROFILING=y CONFIG_PPC64=y # CONFIG_PPC_POWERNV is not set CONFIG_DTL=y # CONFIG_CPU_IDLE is not set CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y CONFIG_BINFMT_MISC=m CONFIG_NET=y CONFIG_PACKET=y CONFIG_UNIX=y CONFIG_XFRM_USER=m CONFIG_XFRM_SUB_POLICY=y CONFIG_NET_KEY=m CONFIG_NET_KEY_MIGRATE=y CONFIG_INET=y CONFIG_IP_MULTICAST=y CONFIG_IP_ADVANCED_ROUTER=y CONFIG_IP_MULTIPLE_TABLES=y CONFIG_IP_ROUTE_MULTIPATH=y CONFIG_IP_ROUTE_VERBOSE=y CONFIG_IP_PNP=y CONFIG_IP_PNP_BOOTP=y CONFIG_IP_PNP_RARP=y CONFIG_NET_IPIP=m CONFIG_IP_MROUTE=y CONFIG_IP_PIMSM_V1=y CONFIG_IP_PIMSM_V2=y CONFIG_SYN_COOKIES=y CONFIG_INET_AH=m CONFIG_INET_ESP=m CONFIG_INET_IPCOMP=m CONFIG_IPV6_ROUTER_PREF=y CONFIG_INET6_AH=y CONFIG_INET6_ESP=y CONFIG_INET6_IPCOMP=m CONFIG_IPV6_TUNNEL=m CONFIG_NETFILTER=y CONFIG_NF_CONNTRACK=m CONFIG_NF_CONNTRACK_AMANDA=m CONFIG_NF_CONNTRACK_FTP=m CONFIG_NF_CONNTRACK_H323=m CONFIG_NF_CONNTRACK_IRC=m CONFIG_NF_CONNTRACK_NETBIOS_NS=m CONFIG_NF_CONNTRACK_PPTP=m CONFIG_NF_CONNTRACK_SANE=m CONFIG_NF_CONNTRACK_SIP=m CONFIG_NF_CONNTRACK_TFTP=m CONFIG_NF_CT_NETLINK=m CONFIG_NETFILTER_XT_TARGET_CLASSIFY=m CONFIG_NETFILTER_XT_TARGET_CONNMARK=m CONFIG_NETFILTER_XT_TARGET_DSCP=m CONFIG_NETFILTER_XT_TARGET_MARK=m CONFIG_NETFILTER_XT_TARGET_NFLOG=m CONFIG_NETFILTER_XT_TARGET_NFQUEUE=m CONFIG_NETFILTER_XT_TARGET_NOTRACK=m CONFIG_NETFILTER_XT_TARGET_TRACE=m CONFIG_NETFILTER_XT_TARGET_TCPMSS=m CONFIG_NETFILTER_XT_MATCH_COMMENT=m CONFIG_NETFILTER_XT_MATCH_CONNBYTES=m CONFIG_NETFILTER_XT_MATCH_CONNLIMIT=m CONFIG_NETFILTER_XT_MATCH_CONNMARK=m CONFIG_NETFILTER_XT_MATCH_CONNTRACK=m CONFIG_NETFILTER_XT_MATCH_DCCP=m CONFIG_NETFILTER_XT_MATCH_DSCP=m CONFIG_NETFILTER_XT_MATCH_ESP=m CONFIG_NETFILTER_XT_MATCH_HASHLIMIT=m CONFIG_NETFILTER_XT_MATCH_HELPER=m CONFIG_NETFILTER_XT_MATCH_LENGTH=m CONFIG_NETFILTER_XT_MATCH_LIMIT=m CONFIG_NETFILTER_XT_MATCH_MAC=m CONFIG_NETFILTER_XT_MATCH_MARK=m CONFIG_NETFILTER_XT_MATCH_MULTIPORT=m CONFIG_NETFILTER_XT_MATCH_POLICY=m CONFIG_NETFILTER_XT_MATCH_PKTTYPE=m CONFIG_NETFILTER_XT_MATCH_QUOTA=m CONFIG_NETFILTER_XT_MATCH_REALM=m CONFIG_NETFILTER_XT_MATCH_STATE=m CONFIG_NETFILTER_XT_MATCH_STATISTIC=m CONFIG_NETFILTER_XT_MATCH_STRING=m CONFIG_NETFILTER_XT_MATCH_TCPMSS=m CONFIG_NETFILTER_XT_MATCH_U32=m CONFIG_IP_NF_IPTABLES=m CONFIG_IP_NF_MATCH_AH=m CONFIG_IP_NF_MATCH_ECN=m CONFIG_IP_NF_MATCH_TTL=m CONFIG_IP_NF_FILTER=m CONFIG_IP_NF_TARGET_REJECT=m CONFIG_IP_NF_MANGLE=m CONFIG_IP_NF_TARGET_CLUSTERIP=m CONFIG_IP_NF_TARGET_ECN=m CONFIG_IP_NF_TARGET_TTL=m CONFIG_IP_NF_RAW=m CONFIG_IP_NF_ARPTABLES=m CONFIG_IP_NF_ARPFILTER=m CONFIG_IP_NF_ARP_MANGLE=m CONFIG_IP6_NF_IPTABLES=m CONFIG_IP6_NF_MATCH_EUI64=m CONFIG_IP6_NF_MATCH_FRAG=m CONFIG_IP6_NF_MATCH_OPTS=m CONFIG_IP6_NF_MATCH_HL=m CONFIG_IP6_NF_MATCH_IPV6HEADER=m CONFIG_IP6_NF_MATCH_RT=m CONFIG_IP6_NF_FILTER=m CONFIG_IP6_NF_MANGLE=m CONFIG_IP6_NF_RAW=m CONFIG_IP_SCTP=m CONFIG_TIPC=m CONFIG_BRIDGE=m CONFIG_VLAN_8021Q=m CONFIG_NET_SCHED=y CONFIG_NET_SCH_CBQ=m CONFIG_NET_SCH_HTB=m CONFIG_NET_SCH_HFSC=m CONFIG_NET_SCH_PRIO=m CONFIG_NET_SCH_RED=m CONFIG_NET_SCH_SFQ=m CONFIG_NET_SCH_TEQL=m CONFIG_NET_SCH_TBF=m CONFIG_NET_SCH_GRED=m CONFIG_NET_SCH_DSMARK=m CONFIG_NET_SCH_NETEM=m CONFIG_NET_SCH_CODEL=m CONFIG_NET_SCH_INGRESS=m CONFIG_NET_CLS_BASIC=y CONFIG_NET_CLS_U32=y CONFIG_NET_CLS_CGROUP=y CONFIG_NET_CLS_ACT=y CONFIG_NET_ACT_MIRRED=m CONFIG_NET_PKTGEN=m CONFIG_DEVTMPFS=y CONFIG_DEVTMPFS_MOUNT=y # CONFIG_FW_LOADER is not set CONFIG_BLK_DEV_LOOP=m CONFIG_BLK_DEV_NBD=m CONFIG_BLK_DEV_RAM=y CONFIG_VIRTIO_BLK=y CONFIG_BLK_DEV_NVME=y CONFIG_BLK_DEV_SD=y CONFIG_BLK_DEV_SR=y CONFIG_MEGARAID_SAS=y CONFIG_SCSI_SYM53C8XX_2=y CONFIG_SCSI_DC395x=y CONFIG_SCSI_AM53C974=y CONFIG_SCSI_VIRTIO=y CONFIG_ATA=y CONFIG_SATA_SIL=y CONFIG_PATA_MACIO=y CONFIG_MD=y CONFIG_BLK_DEV_MD=y CONFIG_MD_LINEAR=y CONFIG_MD_RAID0=y CONFIG_MD_RAID1=y CONFIG_MD_RAID10=y CONFIG_MD_MULTIPATH=y CONFIG_MD_FAULTY=y CONFIG_BLK_DEV_DM=y CONFIG_DM_CRYPT=y CONFIG_DM_SNAPSHOT=y CONFIG_DM_MIRROR=y CONFIG_DM_ZERO=y CONFIG_FUSION=y CONFIG_FUSION_SAS=y CONFIG_NETDEVICES=y CONFIG_BONDING=m CONFIG_DUMMY=m CONFIG_NETCONSOLE=y CONFIG_TUN=m CONFIG_VIRTIO_NET=y CONFIG_PCNET32=y CONFIG_NET_TULIP=y CONFIG_TULIP=y CONFIG_E100=y CONFIG_E1000=y CONFIG_E1000E=y CONFIG_NE2K_PCI=y CONFIG_8139CP=y CONFIG_PPP=m CONFIG_PPP_BSDCOMP=m CONFIG_PPP_DEFLATE=m CONFIG_PPP_FILTER=y CONFIG_PPP_MULTILINK=y CONFIG_PPPOE=m CONFIG_PPP_ASYNC=m CONFIG_PPP_SYNC_TTY=m CONFIG_SLIP=m CONFIG_SLIP_COMPRESSED=y CONFIG_SLIP_SMART=y CONFIG_SLIP_MODE_SLIP6=y CONFIG_USB_USBNET=y CONFIG_INPUT_EVDEV=y CONFIG_INPUT_TABLET=y CONFIG_INPUT_MISC=y # CONFIG_SERIO_SERPORT is not set CONFIG_SERIAL_PMACZILOG=y CONFIG_SERIAL_PMACZILOG_TTYS=y CONFIG_SERIAL_PMACZILOG_CONSOLE=y CONFIG_POWER_SUPPLY=y CONFIG_WATCHDOG=y CONFIG_SOFT_WATCHDOG=m # CONFIG_DRM_DEBUG_MODESET_LOCK is not set CONFIG_FB=y CONFIG_FB_OF=y CONFIG_FRAMEBUFFER_CONSOLE=y CONFIG_USB=y CONFIG_USB_MON=y CONFIG_USB_XHCI_HCD=y CONFIG_USB_EHCI_HCD=y CONFIG_USB_OHCI_HCD=y CONFIG_USB_STORAGE=y CONFIG_USB_UAS=y CONFIG_MMC=y CONFIG_MMC_SDHCI=y CONFIG_MMC_SDHCI_PCI=y CONFIG_NEW_LEDS=y CONFIG_LEDS_CLASS=y CONFIG_VIRTIO_PCI=y CONFIG_VIRTIO_BALLOON=y CONFIG_VIRTIO_MMIO=y CONFIG_EXT2_FS=y CONFIG_EXT2_FS_XATTR=y CONFIG_EXT2_FS_POSIX_ACL=y CONFIG_EXT3_FS=y CONFIG_EXT3_FS_POSIX_ACL=y CONFIG_EXT3_FS_SECURITY=y CONFIG_REISERFS_FS=m CONFIG_REISERFS_FS_XATTR=y CONFIG_REISERFS_FS_POSIX_ACL=y CONFIG_BTRFS_FS=y CONFIG_BTRFS_FS_POSIX_ACL=y CONFIG_AUTOFS4_FS=y CONFIG_FUSE_FS=m CONFIG_CUSE=m CONFIG_ISO9660_FS=y CONFIG_PROC_KCORE=y CONFIG_TMPFS=y CONFIG_CONFIGFS_FS=m CONFIG_SQUASHFS=y CONFIG_SQUASHFS_XATTR=y CONFIG_SQUASHFS_4K_DEVBLK_SIZE=y CONFIG_MINIX_FS=m CONFIG_ROMFS_FS=m CONFIG_NFS_FS=y CONFIG_NFS_V4=y CONFIG_ROOT_NFS=y CONFIG_NFSD=m CONFIG_NFSD_V4=y CONFIG_CIFS=m CONFIG_CIFS_XATTR=y CONFIG_CIFS_POSIX=y CONFIG_NLS_CODEPAGE_437=m CONFIG_NLS_CODEPAGE_737=m CONFIG_NLS_CODEPAGE_775=m CONFIG_NLS_CODEPAGE_850=m CONFIG_NLS_CODEPAGE_852=m CONFIG_NLS_CODEPAGE_855=m CONFIG_NLS_CODEPAGE_857=m CONFIG_NLS_CODEPAGE_860=m CONFIG_NLS_CODEPAGE_861=m CONFIG_NLS_CODEPAGE_862=m CONFIG_NLS_CODEPAGE_863=m CONFIG_NLS_CODEPAGE_864=m CONFIG_NLS_CODEPAGE_865=m CONFIG_NLS_CODEPAGE_866=m CONFIG_NLS_CODEPAGE_869=m CONFIG_NLS_CODEPAGE_936=m CONFIG_NLS_CODEPAGE_950=m CONFIG_NLS_CODEPAGE_932=m CONFIG_NLS_CODEPAGE_949=m CONFIG_NLS_CODEPAGE_874=m CONFIG_NLS_ISO8859_8=m CONFIG_NLS_CODEPAGE_1250=m CONFIG_NLS_CODEPAGE_1251=m CONFIG_NLS_ASCII=m CONFIG_NLS_ISO8859_1=m CONFIG_NLS_ISO8859_2=m CONFIG_NLS_ISO8859_3=m CONFIG_NLS_ISO8859_4=m CONFIG_NLS_ISO8859_5=m CONFIG_NLS_ISO8859_6=m CONFIG_NLS_ISO8859_7=m CONFIG_NLS_ISO8859_9=m CONFIG_NLS_ISO8859_13=m CONFIG_NLS_ISO8859_14=m CONFIG_NLS_ISO8859_15=m CONFIG_NLS_KOI8_R=m CONFIG_NLS_KOI8_U=m CONFIG_NLS_UTF8=m CONFIG_SECURITY=y CONFIG_CRYPTO_TEST=m CONFIG_CRYPTO_BLOWFISH=m CONFIG_CRYPTO_CAMELLIA=m CONFIG_CRYPTO_CAST5=m CONFIG_CRYPTO_CAST6=m CONFIG_CRYPTO_SERPENT=m CONFIG_CRYPTO_TWOFISH=m CONFIG_CRYPTO_MICHAEL_MIC=m CONFIG_CRYPTO_WP512=m CONFIG_CRYPTO_XCBC=m CONFIG_DEBUG_INFO_DWARF5=y CONFIG_KGDB=y CONFIG_KGDB_KDB=y CONFIG_PANIC_ON_OOPS=y CONFIG_SOFTLOCKUP_DETECTOR=y CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=y CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=30 CONFIG_BOOTPARAM_HUNG_TASK_PANIC=y CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y CONFIG_DEBUG_LOCKDEP=y CONFIG_DEBUG_ATOMIC_SLEEP=y CONFIG_DEBUG_LIST=y CONFIG_LATENCYTOP=y CONFIG_FUNCTION_TRACER=y CONFIG_FTRACE_SYSCALLS=y CONFIG_BLK_DEV_IO_TRACE=y
On Wed, Oct 12, 2022 at 09:49:26AM -0600, Jason A. Donenfeld wrote: > On Wed, Oct 12, 2022 at 07:18:27AM -0700, Guenter Roeck wrote: > > NIP [c000000000031630] .replay_soft_interrupts+0x60/0x300 > > LR [c000000000031964] .arch_local_irq_restore+0x94/0x1c0 > > Call Trace: > > [c000000007df3870] [c000000000031964] .arch_local_irq_restore+0x94/0x1c0 (unreliable) > > [c000000007df38f0] [c000000000f8a444] .__schedule+0x664/0xa50 > > [c000000007df39d0] [c000000000f8a8b0] .schedule+0x80/0x140 > > [c000000007df3a50] [c00000000092f0dc] .try_to_generate_entropy+0x118/0x174 > > [c000000007df3b40] [c00000000092e2e4] .urandom_read_iter+0x74/0x140 > > [c000000007df3bc0] [c0000000003b0044] .vfs_read+0x284/0x2d0 > > [c000000007df3cd0] [c0000000003b0d2c] .ksys_read+0xdc/0x130 > > [c000000007df3d80] [c00000000002a88c] .system_call_exception+0x19c/0x330 > > [c000000007df3e10] [c00000000000c1d4] system_call_common+0xf4/0x258 > > Obviously the first couple lines of this concern me a bit. But I think > actually this might just be a catalyst for another bug. You could view > that function as basically just: > > while (something) > schedule(); > > And I guess in the process of calling the scheduler a lot, which toggles > interrupts a lot, something got wedged. > > Curious, though, I did try to reproduce this, to no avail. My .config is > https://xn--4db.cc/rBvHWfDZ . What's yours? I also just tried using your github linux-build-test scripts as a guide for construction a config -- https://xn--4db.cc/B0HpEQDQ -- and loaded up your rootfs over sdhci and such, and still couldn't manage to reproduce. I tried commenting out the line "if (!bits)" in _credit_init_bits(), so that the rng would never initialize, so that the schedule() loop would just keep on running indefinitely, but still no dice. But also, I'm running Linus' tree. From your log, I see "6.0.0-rc2-00163-ga5edf9815dd7". So maybe these bugs got fixed elsewhere? Jason
On Wed, Oct 12, 2022 at 10:45:46AM -0600, Jason A. Donenfeld wrote: > On Wed, Oct 12, 2022 at 09:49:26AM -0600, Jason A. Donenfeld wrote: > > On Wed, Oct 12, 2022 at 07:18:27AM -0700, Guenter Roeck wrote: > > > NIP [c000000000031630] .replay_soft_interrupts+0x60/0x300 > > > LR [c000000000031964] .arch_local_irq_restore+0x94/0x1c0 > > > Call Trace: > > > [c000000007df3870] [c000000000031964] .arch_local_irq_restore+0x94/0x1c0 (unreliable) > > > [c000000007df38f0] [c000000000f8a444] .__schedule+0x664/0xa50 > > > [c000000007df39d0] [c000000000f8a8b0] .schedule+0x80/0x140 > > > [c000000007df3a50] [c00000000092f0dc] .try_to_generate_entropy+0x118/0x174 > > > [c000000007df3b40] [c00000000092e2e4] .urandom_read_iter+0x74/0x140 > > > [c000000007df3bc0] [c0000000003b0044] .vfs_read+0x284/0x2d0 > > > [c000000007df3cd0] [c0000000003b0d2c] .ksys_read+0xdc/0x130 > > > [c000000007df3d80] [c00000000002a88c] .system_call_exception+0x19c/0x330 > > > [c000000007df3e10] [c00000000000c1d4] system_call_common+0xf4/0x258 > > > > Obviously the first couple lines of this concern me a bit. But I think > > actually this might just be a catalyst for another bug. You could view > > that function as basically just: > > > > while (something) > > schedule(); > > > > And I guess in the process of calling the scheduler a lot, which toggles > > interrupts a lot, something got wedged. > > > > Curious, though, I did try to reproduce this, to no avail. My .config is > > https://xn--4db.cc/rBvHWfDZ . What's yours? > > I also just tried using your github linux-build-test scripts as a guide > for construction a config -- https://xn--4db.cc/B0HpEQDQ -- and loaded > up your rootfs over sdhci and such, and still couldn't manage to > reproduce. I tried commenting out the line "if (!bits)" in > _credit_init_bits(), so that the rng would never initialize, so that the > schedule() loop would just keep on running indefinitely, but still no > dice. > > But also, I'm running Linus' tree. From your log, I see > "6.0.0-rc2-00163-ga5edf9815dd7". So maybe these bugs got fixed > elsewhere? > Blame me for not attaching the latest crash report. Guenter --- BUG: soft lockup - CPU#0 stuck for 23s! [dd:111] Modules linked in: CPU: 0 PID: 111 Comm: dd Not tainted 6.0.0-11414-g49da07006239 #1 Hardware name: PowerMac3,1 PPC970FX 0x3c0301 PowerMac NIP: c000000000031630 LR: c000000000031964 CTR: 0000000000000000 REGS: c000000007d5b6a8 TRAP: 0900 Not tainted (6.0.0-11414-g49da07006239) MSR: 8000000000009032 <SF,EE,ME,IR,DR,RI> CR: 28002228 XER: 00000000 IRQMASK: 0 GPR00: c000000000031964 c000000007d5b870 c0000000013e5500 c000000007d5b6a8 GPR04: c00000000125e1c0 0000000000000000 c000000007d5b814 c00000000291d018 GPR08: c000000002d4bbb8 0000000000000000 c000000007356400 c000000002d21098 GPR12: 0000000028002222 c000000002e20000 00000000100d32e0 00000000100d32b4 GPR16: 00000000100d3301 00000000100d32b9 00000000100d3358 00000000100d32bf GPR20: 0000000000002000 00000000100d3372 00000000100d331e c000000007356c18 GPR24: 0000000000000000 0000000000000e60 0000000000000900 0000000000000500 GPR28: 0000000000000a00 0000000000000f00 0000000000000002 0000000000000003 NIP [c000000000031630] .replay_soft_interrupts+0x60/0x300 LR [c000000000031964] .arch_local_irq_restore+0x94/0x1c0 Call Trace: [c000000007d5b870] [c000000000031964] .arch_local_irq_restore+0x94/0x1c0 (unreliable) [c000000007d5b8f0] [c000000000f8bac4] .__schedule+0x664/0xa50 [c000000007d5b9d0] [c000000000f8bf30] .schedule+0x80/0x140 [c000000007d5ba50] [c00000000093085c] .try_to_generate_entropy+0x118/0x174 [c000000007d5bb40] [c00000000092fa64] .urandom_read_iter+0x74/0x140 [c000000007d5bbc0] [c0000000003b0044] .vfs_read+0x284/0x2d0 [c000000007d5bcd0] [c0000000003b0d2c] .ksys_read+0xdc/0x130 [c000000007d5bd80] [c00000000002a88c] .system_call_exception+0x19c/0x330 [c000000007d5be10] [c00000000000c1d4] system_call_common+0xf4/0x258 --- interrupt: c00 at 0x7fffb5c9d49c NIP: 00007fffb5c9d49c LR: 000000001000da90 CTR: 0000000000000000 REGS: c000000007d5be80 TRAP: 0c00 Not tainted (6.0.0-11414-g49da07006239) MSR: 800000000000f032 <SF,EE,PR,FP,ME,IR,DR,RI> CR: 22002422 XER: 00000000 IRQMASK: 0 GPR00: 0000000000000003 00007ffff6dcc220 00007fffb5d97300 0000000000000000 GPR04: 00000000101102a0 0000000000000020 0000000000000000 0000000000000000 GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR12: 0000000000000000 00007fffb5e6aac0 00000000100d32e0 00000000100d32b4 GPR16: 00000000100d3301 00000000100d32b9 00000000100d3358 00000000100d32bf GPR20: 0000000000002000 00000000100d3372 00000000100d331e 0000000000000000 GPR24: 7fffffffffffffff 00000000100b3a9c 00000000101102a0 0000000000000020 GPR28: 00000000101025c0 0000000000000020 0000000000000000 0000000000000000 NIP [00007fffb5c9d49c] 0x7fffb5c9d49c LR [000000001000da90] 0x1000da90 --- interrupt: c00 Instruction dump: 3b600500 3b800a00 3ba00f00 f8010010 f821fdc1 60000000 60000000 38610078 e92d0af8 f92101f8 39200000 48039745 <60000000> 39000000 e9410180 892d0933 Kernel panic - not syncing: softlockup: hung tasks CPU: 0 PID: 111 Comm: dd Tainted: G L 6.0.0-11414-g49da07006239 #1 Hardware name: PowerMac3,1 PPC970FX 0x3c0301 PowerMac Call Trace: [c000000007d5b180] [c000000000f59b40] .dump_stack_lvl+0x7c/0xc4 (unreliable) [c000000007d5b210] [c0000000000d5b58] .panic+0x174/0x42c [c000000007d5b2c0] [c000000000231054] .watchdog_timer_fn+0x3a4/0x3e0 [c000000007d5b380] [c0000000001cc65c] .__hrtimer_run_queues+0x1fc/0x650 [c000000007d5b490] [c0000000001cd5cc] .hrtimer_interrupt+0x11c/0x320 [c000000007d5b550] [c000000000021f5c] .timer_interrupt+0x1cc/0x600 [c000000007d5b630] [c0000000000316c8] .replay_soft_interrupts+0xf8/0x300 [c000000007d5b870] [c000000000031964] .arch_local_irq_restore+0x94/0x1c0 [c000000007d5b8f0] [c000000000f8bac4] .__schedule+0x664/0xa50 [c000000007d5b9d0] [c000000000f8bf30] .schedule+0x80/0x140 [c000000007d5ba50] [c00000000093085c] .try_to_generate_entropy+0x118/0x174 [c000000007d5bb40] [c00000000092fa64] .urandom_read_iter+0x74/0x140 [c000000007d5bbc0] [c0000000003b0044] .vfs_read+0x284/0x2d0 [c000000007d5bcd0] [c0000000003b0d2c] .ksys_read+0xdc/0x130 [c000000007d5bd80] [c00000000002a88c] .system_call_exception+0x19c/0x330 [c000000007d5be10] [c00000000000c1d4] system_call_common+0xf4/0x258 --- interrupt: c00 at 0x7fffb5c9d49c NIP: 00007fffb5c9d49c LR: 000000001000da90 CTR: 0000000000000000 REGS: c000000007d5be80 TRAP: 0c00 Tainted: G L (6.0.0-11414-g49da07006239) MSR: 800000000000f032 <SF,EE,PR,FP,ME,IR,DR,RI> CR: 22002422 XER: 00000000 IRQMASK: 0 GPR00: 0000000000000003 00007ffff6dcc220 00007fffb5d97300 0000000000000000 GPR04: 00000000101102a0 0000000000000020 0000000000000000 0000000000000000 GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR12: 0000000000000000 00007fffb5e6aac0 00000000100d32e0 00000000100d32b4 GPR16: 00000000100d3301 00000000100d32b9 00000000100d3358 00000000100d32bf GPR20: 0000000000002000 00000000100d3372 00000000100d331e 0000000000000000 GPR24: 7fffffffffffffff 00000000100b3a9c 00000000101102a0 0000000000000020 GPR28: 00000000101025c0 0000000000000020 0000000000000000 0000000000000000 NIP [00007fffb5c9d49c] 0x7fffb5c9d49c LR [000000001000da90] 0x1000da90 --- interrupt: c00
On Wed, Oct 12, 2022 at 09:44:52AM -0700, Guenter Roeck wrote: > On Wed, Oct 12, 2022 at 09:49:26AM -0600, Jason A. Donenfeld wrote: > > On Wed, Oct 12, 2022 at 07:18:27AM -0700, Guenter Roeck wrote: > > > NIP [c000000000031630] .replay_soft_interrupts+0x60/0x300 > > > LR [c000000000031964] .arch_local_irq_restore+0x94/0x1c0 > > > Call Trace: > > > [c000000007df3870] [c000000000031964] .arch_local_irq_restore+0x94/0x1c0 (unreliable) > > > [c000000007df38f0] [c000000000f8a444] .__schedule+0x664/0xa50 > > > [c000000007df39d0] [c000000000f8a8b0] .schedule+0x80/0x140 > > > [c000000007df3a50] [c00000000092f0dc] .try_to_generate_entropy+0x118/0x174 > > > [c000000007df3b40] [c00000000092e2e4] .urandom_read_iter+0x74/0x140 > > > [c000000007df3bc0] [c0000000003b0044] .vfs_read+0x284/0x2d0 > > > [c000000007df3cd0] [c0000000003b0d2c] .ksys_read+0xdc/0x130 > > > [c000000007df3d80] [c00000000002a88c] .system_call_exception+0x19c/0x330 > > > [c000000007df3e10] [c00000000000c1d4] system_call_common+0xf4/0x258 > > > > Obviously the first couple lines of this concern me a bit. But I think > > actually this might just be a catalyst for another bug. You could view > > that function as basically just: > > > > while (something) > > schedule(); > > > > And I guess in the process of calling the scheduler a lot, which toggles > > interrupts a lot, something got wedged. > > > > Curious, though, I did try to reproduce this, to no avail. My .config is > > https://xn--4db.cc/rBvHWfDZ . What's yours? > > > > Attached. My qemu command line is Okay, thanks, I reproduced it. In this case, I suspect try_to_generate_entropy() is just the messenger. There's an earlier problem: BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1 caller is .__flush_tlb_pending+0x40/0xf0 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.0.0-28380-gde492c83cae0-dirty #4 Hardware name: PowerMac3,1 PPC970FX 0x3c0301 PowerMac Call Trace: [c0000000044c3540] [c000000000f93ef0] .dump_stack_lvl+0x7c/0xc4 (unreliable) [c0000000044c35d0] [c000000000fc9550] .check_preemption_disabled+0x140/0x150 [c0000000044c3660] [c000000000073dd0] .__flush_tlb_pending+0x40/0xf0 [c0000000044c36f0] [c000000000334434] .__apply_to_page_range+0x764/0xa30 [c0000000044c3840] [c00000000006cad0] .change_memory_attr+0xf0/0x160 [c0000000044c38d0] [c0000000002a1d70] .bpf_prog_select_runtime+0x150/0x230 [c0000000044c3970] [c000000000d405d4] .bpf_prepare_filter+0x504/0x6f0 [c0000000044c3a30] [c000000000d4085c] .bpf_prog_create+0x9c/0x140 [c0000000044c3ac0] [c000000002051d9c] .ptp_classifier_init+0x44/0x78 [c0000000044c3b50] [c000000002050f3c] .sock_init+0xe0/0x100 [c0000000044c3bd0] [c000000000010bd4] .do_one_initcall+0xa4/0x438 [c0000000044c3cc0] [c000000002005008] .kernel_init_freeable+0x378/0x428 [c0000000044c3da0] [c0000000000113d8] .kernel_init+0x28/0x1a0 [c0000000044c3e10] [c00000000000ca3c] .ret_from_kernel_thread+0x58/0x60 This in turn is because __flush_tlb_pending() calls: static inline int mm_is_thread_local(struct mm_struct *mm) { return cpumask_equal(mm_cpumask(mm), cpumask_of(smp_processor_id())); } __flush_tlb_pending() has a comment about this: * Must be called from within some kind of spinlock/non-preempt region... */ void __flush_tlb_pending(struct ppc64_tlb_batch *batch) So I guess that didn't happen for some reason? Maybe this is indicative of some lock imbalance that then gets hit later? I've also managed to not hit this bug a few times. When it triggers, after "kprobes: kprobe jump-optimization is enabled. All kprobes are optimized if possible.", there's a long hang - tens seconds before it continues. When it doesn't trigger, there's no hang at that point in the boot process. Jason
On Wed, Oct 12, 2022 at 11:20:38AM -0600, Jason A. Donenfeld wrote: > On Wed, Oct 12, 2022 at 09:44:52AM -0700, Guenter Roeck wrote: > > On Wed, Oct 12, 2022 at 09:49:26AM -0600, Jason A. Donenfeld wrote: > > > On Wed, Oct 12, 2022 at 07:18:27AM -0700, Guenter Roeck wrote: > > > > NIP [c000000000031630] .replay_soft_interrupts+0x60/0x300 > > > > LR [c000000000031964] .arch_local_irq_restore+0x94/0x1c0 > > > > Call Trace: > > > > [c000000007df3870] [c000000000031964] .arch_local_irq_restore+0x94/0x1c0 (unreliable) > > > > [c000000007df38f0] [c000000000f8a444] .__schedule+0x664/0xa50 > > > > [c000000007df39d0] [c000000000f8a8b0] .schedule+0x80/0x140 > > > > [c000000007df3a50] [c00000000092f0dc] .try_to_generate_entropy+0x118/0x174 > > > > [c000000007df3b40] [c00000000092e2e4] .urandom_read_iter+0x74/0x140 > > > > [c000000007df3bc0] [c0000000003b0044] .vfs_read+0x284/0x2d0 > > > > [c000000007df3cd0] [c0000000003b0d2c] .ksys_read+0xdc/0x130 > > > > [c000000007df3d80] [c00000000002a88c] .system_call_exception+0x19c/0x330 > > > > [c000000007df3e10] [c00000000000c1d4] system_call_common+0xf4/0x258 > > > > > > Obviously the first couple lines of this concern me a bit. But I think > > > actually this might just be a catalyst for another bug. You could view > > > that function as basically just: > > > > > > while (something) > > > schedule(); > > > > > > And I guess in the process of calling the scheduler a lot, which toggles > > > interrupts a lot, something got wedged. > > > > > > Curious, though, I did try to reproduce this, to no avail. My .config is > > > https://xn--4db.cc/rBvHWfDZ . What's yours? > > > > > > > Attached. My qemu command line is > > Okay, thanks, I reproduced it. In this case, I suspect > try_to_generate_entropy() is just the messenger. There's an earlier > problem: > That problem is not new but has existed for a couple of releases, and has never caused a hang until now. > BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1 > caller is .__flush_tlb_pending+0x40/0xf0 > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.0.0-28380-gde492c83cae0-dirty #4 > Hardware name: PowerMac3,1 PPC970FX 0x3c0301 PowerMac > Call Trace: > [c0000000044c3540] [c000000000f93ef0] .dump_stack_lvl+0x7c/0xc4 (unreliable) > [c0000000044c35d0] [c000000000fc9550] .check_preemption_disabled+0x140/0x150 > [c0000000044c3660] [c000000000073dd0] .__flush_tlb_pending+0x40/0xf0 > [c0000000044c36f0] [c000000000334434] .__apply_to_page_range+0x764/0xa30 > [c0000000044c3840] [c00000000006cad0] .change_memory_attr+0xf0/0x160 > [c0000000044c38d0] [c0000000002a1d70] .bpf_prog_select_runtime+0x150/0x230 > [c0000000044c3970] [c000000000d405d4] .bpf_prepare_filter+0x504/0x6f0 > [c0000000044c3a30] [c000000000d4085c] .bpf_prog_create+0x9c/0x140 > [c0000000044c3ac0] [c000000002051d9c] .ptp_classifier_init+0x44/0x78 > [c0000000044c3b50] [c000000002050f3c] .sock_init+0xe0/0x100 > [c0000000044c3bd0] [c000000000010bd4] .do_one_initcall+0xa4/0x438 > [c0000000044c3cc0] [c000000002005008] .kernel_init_freeable+0x378/0x428 > [c0000000044c3da0] [c0000000000113d8] .kernel_init+0x28/0x1a0 > [c0000000044c3e10] [c00000000000ca3c] .ret_from_kernel_thread+0x58/0x60 > > This in turn is because __flush_tlb_pending() calls: > > static inline int mm_is_thread_local(struct mm_struct *mm) > { > return cpumask_equal(mm_cpumask(mm), > cpumask_of(smp_processor_id())); > } > > __flush_tlb_pending() has a comment about this: > > * Must be called from within some kind of spinlock/non-preempt region... > */ > void __flush_tlb_pending(struct ppc64_tlb_batch *batch) > > So I guess that didn't happen for some reason? Maybe this is indicative > of some lock imbalance that then gets hit later? > > I've also managed to not hit this bug a few times. When it triggers, > after "kprobes: kprobe jump-optimization is enabled. All kprobes are > optimized if possible.", there's a long hang - tens seconds before it > continues. When it doesn't trigger, there's no hang at that point in the > boot process. > That probably explains why my attempts to bisect the problem were unsuccessful. Thanks, Guenter
On Wed, Oct 12, 2022 at 10:48:26AM -0700, Guenter Roeck wrote: > > I've also managed to not hit this bug a few times. When it triggers, > > after "kprobes: kprobe jump-optimization is enabled. All kprobes are > > optimized if possible.", there's a long hang - tens seconds before it > > continues. When it doesn't trigger, there's no hang at that point in the > > boot process. > > > > That probably explains why my attempts to bisect the problem were > unsuccessful. So I just did this: diff --git a/drivers/char/random.c b/drivers/char/random.c index 2fe28eeb2f38..2d70bc09db7e 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -1212,6 +1212,7 @@ static void __cold try_to_generate_entropy(void) struct entropy_timer_state stack; unsigned int i, num_different = 0; unsigned long last = random_get_entropy(); + return; for (i = 0; i < NUM_TRIAL_SAMPLES - 1; ++i) { stack.entropy = random_get_entropy(); And then ran it, and now we get the lockup from the idle process: udhcpc: started, v1.33.0 udhcpc: sending discover watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [swapper/0:0] Modules linked in: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.0.0-28380-gde492c83cae0-dirty #10 Hardware name: PowerMac3,1 PPC970FX 0x3c0301 PowerMac NIP: c0000000000300f8 LR: c0000000000304e8 CTR: c00000000001a410 REGS: c0000000028c79a8 TRAP: 0900 Not tainted (6.0.0-28380-gde492c83cae0-dirty) MSR: 800000000000b032 <SF,EE,FP,ME,IR,DR,RI> CR: 24088442 XER: 00000000 IRQMASK: 0 GPR00: c0000000000304e8 c0000000028c7b30 c000000001435500 c0000000028c79a8 GPR04: c0000000013366c0 0000000000000000 000000000010029c 0000000000000000 GPR08: c000000002d3bbb0 0000000000000000 c000000002883d00 c000000002915500 GPR12: 0000000044088442 c000000002e00000 0000000000000007 0000000002295698 GPR16: 00000000039400e8 0000000002295258 0000000002295660 00000000022953d0 GPR20: 0000000002295b10 00000000022b34d0 0000000002295b38 0000000003945500 GPR24: 0000000003945500 0000000000080000 c000000002883d80 c000000002883d00 GPR28: c00000000290d0c0 0000000000000001 c00000000290d018 c00000000290cc78 NIP [c0000000000300f8] .replay_soft_interrupts+0x28/0x2d0 LR [c0000000000304e8] .arch_local_irq_restore+0x148/0x1a0 Call Trace: [c0000000028c7b30] [c0000000000304e8] .arch_local_irq_restore+0x148/0x1a0 (unreliable) [c0000000028c7bb0] [c00000000001a388] .arch_cpu_idle+0xb8/0x140 [c0000000028c7c30] [c000000000fd4940] .default_idle_call+0x80/0xc8 [c0000000028c7ca0] [c000000000148480] .do_idle+0x150/0x1a0 [c0000000028c7d50] [c000000000148748] .cpu_startup_entry+0x38/0x40 [c0000000028c7dd0] [c0000000000113a8] .rest_init+0x168/0x170 [c0000000028c7e60] [c000000002004224] .arch_post_acpi_subsys_init+0x0/0x24 [c0000000028c7ed0] [c000000002004ba8] .start_kernel+0x8d0/0x924 [c0000000028c7f90] [c00000000000d4ac] start_here_common+0x1c/0x20 Instruction dump: 60000000 60000000 7c0802a6 f8010010 f821fe01 60000000 60000000 38610078 e92d0af8 f92101f8 39200000 4803a491 <60000000> 39200000 e9410180 f92101b0 Kernel panic - not syncing: softlockup: hung tasks CPU: 0 PID: 0 Comm: swapper/0 Tainted: G L 6.0.0-28380-gde492c83cae0-dirty #10 Hardware name: PowerMac3,1 PPC970FX 0x3c0301 PowerMac Call Trace: [c0000000028c74a0] [c000000000f93b90] .dump_stack_lvl+0x7c/0xc4 (unreliable) [c0000000028c7530] [c0000000000d2a58] .panic+0x180/0x438 [c0000000028c75e0] [c000000000232424] .watchdog_timer_fn+0x3a4/0x410 [c0000000028c76a0] [c0000000001cb964] .__hrtimer_run_queues+0x1f4/0x590 [c0000000028c77a0] [c0000000001cc354] .hrtimer_interrupt+0x134/0x300 [c0000000028c7860] [c000000000021cd4] .timer_interrupt+0x1c4/0x5d0 [c0000000028c7930] [c0000000000302f8] .replay_soft_interrupts+0x228/0x2d0 [c0000000028c7b30] [c0000000000304e8] .arch_local_irq_restore+0x148/0x1a0 [c0000000028c7bb0] [c00000000001a388] .arch_cpu_idle+0xb8/0x140 [c0000000028c7c30] [c000000000fd4940] .default_idle_call+0x80/0xc8 [c0000000028c7ca0] [c000000000148480] .do_idle+0x150/0x1a0 [c0000000028c7d50] [c000000000148748] .cpu_startup_entry+0x38/0x40 [c0000000028c7dd0] [c0000000000113a8] .rest_init+0x168/0x170 [c0000000028c7e60] [c000000002004224] .arch_post_acpi_subsys_init+0x0/0x24 [c0000000028c7ed0] [c000000002004ba8] .start_kernel+0x8d0/0x924 [c0000000028c7f90] [c00000
On Wed, Oct 12, 2022 at 11:20:38AM -0600, Jason A. Donenfeld wrote: > > I've also managed to not hit this bug a few times. When it triggers, > after "kprobes: kprobe jump-optimization is enabled. All kprobes are > optimized if possible.", there's a long hang - tens seconds before it > continues. When it doesn't trigger, there's no hang at that point in the > boot process. > I managed to bisect the problem. See below for results. Reverting the offending patch fixes the problem for me. Guenter --- # bad: [1440f576022887004f719883acb094e7e0dd4944] Merge tag 'mm-hotfixes-stable-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm # good: [4fe89d07dcc2804c8b562f6c7896a45643d34b2f] Linux 6.0 git bisect start 'HEAD' 'v6.0' # good: [7171a8da00035e7913c3013ca5fb5beb5b8b22f0] Merge tag 'arm-dt-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc git bisect good 7171a8da00035e7913c3013ca5fb5beb5b8b22f0 # good: [f01603979a4afaad7504a728918b678d572cda9e] Merge tag 'gpio-updates-for-v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux git bisect good f01603979a4afaad7504a728918b678d572cda9e # bad: [8aeab132e05fefc3a1a5277878629586bd7a3547] Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost git bisect bad 8aeab132e05fefc3a1a5277878629586bd7a3547 # bad: [493ffd6605b2d3d4dc7008ab927dba319f36671f] Merge tag 'ucount-rlimits-cleanups-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace git bisect bad 493ffd6605b2d3d4dc7008ab927dba319f36671f # good: [0e470763d84dcad27284067647dfb4b1a94dfce0] Merge tag 'efi-next-for-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi git bisect good 0e470763d84dcad27284067647dfb4b1a94dfce0 # bad: [110a58b9f91c66f743c01a2c217243d94c899c23] powerpc/boot: Explicitly disable usage of SPE instructions git bisect bad 110a58b9f91c66f743c01a2c217243d94c899c23 # good: [fdfdcfd504933ed06eb6b4c9df21eede0e213c3e] powerpc/build: put sys_call_table in .data.rel.ro if RELOCATABLE git bisect good fdfdcfd504933ed06eb6b4c9df21eede0e213c3e # good: [c2e7a19827eec443a7cbe85e8d959052412d6dc3] powerpc: Use generic fallocate compatibility syscall git bisect good c2e7a19827eec443a7cbe85e8d959052412d6dc3 # good: [56adbb7a8b6cc7fc9b940829c38494e53c9e57d1] powerpc/64/interrupt: Fix false warning in context tracking due to idle state git bisect good 56adbb7a8b6cc7fc9b940829c38494e53c9e57d1 # bad: [754f611774e4b9357a944f5b703dd291c85161cf] powerpc/64: switch asm helpers from GOT to TOC relative addressing git bisect bad 754f611774e4b9357a944f5b703dd291c85161cf # bad: [f7bff6e7759b1abb59334f6448f9ef3172c4c04a] powerpc/64/interrupt: avoid BUG/WARN recursion in interrupt entry git bisect bad f7bff6e7759b1abb59334f6448f9ef3172c4c04a # bad: [e485f6c751e0a969327336c635ca602feea117f0] powerpc/64/interrupt: Fix return to masked context after hard-mask irq becomes pending git bisect bad e485f6c751e0a969327336c635ca602feea117f0 # good: [799f7063c7645f9a751d17f5dfd73b952f962cd2] powerpc/64: mark irqs hard disabled in boot paca git bisect good 799f7063c7645f9a751d17f5dfd73b952f962cd2 # first bad commit: [e485f6c751e0a969327336c635ca602feea117f0] powerpc/64/interrupt: Fix return to masked context after hard-mask irq becomes pending
Guenter Roeck <linux@roeck-us.net> writes: > On Wed, Oct 12, 2022 at 11:20:38AM -0600, Jason A. Donenfeld wrote: >> >> I've also managed to not hit this bug a few times. When it triggers, >> after "kprobes: kprobe jump-optimization is enabled. All kprobes are >> optimized if possible.", there's a long hang - tens seconds before it >> continues. When it doesn't trigger, there's no hang at that point in the >> boot process. >> > > I managed to bisect the problem. See below for results. Reverting the > offending patch fixes the problem for me. Thanks. This is probably down to me/us not testing with PREEMPT enabled enough. cheers > --- > # bad: [1440f576022887004f719883acb094e7e0dd4944] Merge tag 'mm-hotfixes-stable-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm > # good: [4fe89d07dcc2804c8b562f6c7896a45643d34b2f] Linux 6.0 > git bisect start 'HEAD' 'v6.0' > # good: [7171a8da00035e7913c3013ca5fb5beb5b8b22f0] Merge tag 'arm-dt-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc > git bisect good 7171a8da00035e7913c3013ca5fb5beb5b8b22f0 > # good: [f01603979a4afaad7504a728918b678d572cda9e] Merge tag 'gpio-updates-for-v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux > git bisect good f01603979a4afaad7504a728918b678d572cda9e > # bad: [8aeab132e05fefc3a1a5277878629586bd7a3547] Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost > git bisect bad 8aeab132e05fefc3a1a5277878629586bd7a3547 > # bad: [493ffd6605b2d3d4dc7008ab927dba319f36671f] Merge tag 'ucount-rlimits-cleanups-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace > git bisect bad 493ffd6605b2d3d4dc7008ab927dba319f36671f > # good: [0e470763d84dcad27284067647dfb4b1a94dfce0] Merge tag 'efi-next-for-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi > git bisect good 0e470763d84dcad27284067647dfb4b1a94dfce0 > # bad: [110a58b9f91c66f743c01a2c217243d94c899c23] powerpc/boot: Explicitly disable usage of SPE instructions > git bisect bad 110a58b9f91c66f743c01a2c217243d94c899c23 > # good: [fdfdcfd504933ed06eb6b4c9df21eede0e213c3e] powerpc/build: put sys_call_table in .data.rel.ro if RELOCATABLE > git bisect good fdfdcfd504933ed06eb6b4c9df21eede0e213c3e > # good: [c2e7a19827eec443a7cbe85e8d959052412d6dc3] powerpc: Use generic fallocate compatibility syscall > git bisect good c2e7a19827eec443a7cbe85e8d959052412d6dc3 > # good: [56adbb7a8b6cc7fc9b940829c38494e53c9e57d1] powerpc/64/interrupt: Fix false warning in context tracking due to idle state > git bisect good 56adbb7a8b6cc7fc9b940829c38494e53c9e57d1 > # bad: [754f611774e4b9357a944f5b703dd291c85161cf] powerpc/64: switch asm helpers from GOT to TOC relative addressing > git bisect bad 754f611774e4b9357a944f5b703dd291c85161cf > # bad: [f7bff6e7759b1abb59334f6448f9ef3172c4c04a] powerpc/64/interrupt: avoid BUG/WARN recursion in interrupt entry > git bisect bad f7bff6e7759b1abb59334f6448f9ef3172c4c04a > # bad: [e485f6c751e0a969327336c635ca602feea117f0] powerpc/64/interrupt: Fix return to masked context after hard-mask irq becomes pending > git bisect bad e485f6c751e0a969327336c635ca602feea117f0 > # good: [799f7063c7645f9a751d17f5dfd73b952f962cd2] powerpc/64: mark irqs hard disabled in boot paca > git bisect good 799f7063c7645f9a751d17f5dfd73b952f962cd2 > # first bad commit: [e485f6c751e0a969327336c635ca602feea117f0] powerpc/64/interrupt: Fix return to masked context after hard-mask irq becomes pending
On Thu, Oct 13, 2022 at 11:03:34AM +1100, Michael Ellerman wrote: > Guenter Roeck <linux@roeck-us.net> writes: > > On Wed, Oct 12, 2022 at 11:20:38AM -0600, Jason A. Donenfeld wrote: > >> > >> I've also managed to not hit this bug a few times. When it triggers, > >> after "kprobes: kprobe jump-optimization is enabled. All kprobes are > >> optimized if possible.", there's a long hang - tens seconds before it > >> continues. When it doesn't trigger, there's no hang at that point in the > >> boot process. > >> > > > > I managed to bisect the problem. See below for results. Reverting the > > offending patch fixes the problem for me. > > Thanks. > > This is probably down to me/us not testing with PREEMPT enabled enough. > Not sure. My configuration has CONFIG_PREEMPT_NONE=y # CONFIG_PREEMPT_VOLUNTARY is not set # CONFIG_PREEMPT is not set Guenter > cheers > > > --- > > # bad: [1440f576022887004f719883acb094e7e0dd4944] Merge tag 'mm-hotfixes-stable-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm > > # good: [4fe89d07dcc2804c8b562f6c7896a45643d34b2f] Linux 6.0 > > git bisect start 'HEAD' 'v6.0' > > # good: [7171a8da00035e7913c3013ca5fb5beb5b8b22f0] Merge tag 'arm-dt-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc > > git bisect good 7171a8da00035e7913c3013ca5fb5beb5b8b22f0 > > # good: [f01603979a4afaad7504a728918b678d572cda9e] Merge tag 'gpio-updates-for-v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux > > git bisect good f01603979a4afaad7504a728918b678d572cda9e > > # bad: [8aeab132e05fefc3a1a5277878629586bd7a3547] Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost > > git bisect bad 8aeab132e05fefc3a1a5277878629586bd7a3547 > > # bad: [493ffd6605b2d3d4dc7008ab927dba319f36671f] Merge tag 'ucount-rlimits-cleanups-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace > > git bisect bad 493ffd6605b2d3d4dc7008ab927dba319f36671f > > # good: [0e470763d84dcad27284067647dfb4b1a94dfce0] Merge tag 'efi-next-for-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi > > git bisect good 0e470763d84dcad27284067647dfb4b1a94dfce0 > > # bad: [110a58b9f91c66f743c01a2c217243d94c899c23] powerpc/boot: Explicitly disable usage of SPE instructions > > git bisect bad 110a58b9f91c66f743c01a2c217243d94c899c23 > > # good: [fdfdcfd504933ed06eb6b4c9df21eede0e213c3e] powerpc/build: put sys_call_table in .data.rel.ro if RELOCATABLE > > git bisect good fdfdcfd504933ed06eb6b4c9df21eede0e213c3e > > # good: [c2e7a19827eec443a7cbe85e8d959052412d6dc3] powerpc: Use generic fallocate compatibility syscall > > git bisect good c2e7a19827eec443a7cbe85e8d959052412d6dc3 > > # good: [56adbb7a8b6cc7fc9b940829c38494e53c9e57d1] powerpc/64/interrupt: Fix false warning in context tracking due to idle state > > git bisect good 56adbb7a8b6cc7fc9b940829c38494e53c9e57d1 > > # bad: [754f611774e4b9357a944f5b703dd291c85161cf] powerpc/64: switch asm helpers from GOT to TOC relative addressing > > git bisect bad 754f611774e4b9357a944f5b703dd291c85161cf > > # bad: [f7bff6e7759b1abb59334f6448f9ef3172c4c04a] powerpc/64/interrupt: avoid BUG/WARN recursion in interrupt entry > > git bisect bad f7bff6e7759b1abb59334f6448f9ef3172c4c04a > > # bad: [e485f6c751e0a969327336c635ca602feea117f0] powerpc/64/interrupt: Fix return to masked context after hard-mask irq becomes pending > > git bisect bad e485f6c751e0a969327336c635ca602feea117f0 > > # good: [799f7063c7645f9a751d17f5dfd73b952f962cd2] powerpc/64: mark irqs hard disabled in boot paca > > git bisect good 799f7063c7645f9a751d17f5dfd73b952f962cd2 > > # first bad commit: [e485f6c751e0a969327336c635ca602feea117f0] powerpc/64/interrupt: Fix return to masked context after hard-mask irq becomes pending
On 10/12/22 10:20, Jason A. Donenfeld wrote: > On Wed, Oct 12, 2022 at 09:44:52AM -0700, Guenter Roeck wrote: >> On Wed, Oct 12, 2022 at 09:49:26AM -0600, Jason A. Donenfeld wrote: >>> On Wed, Oct 12, 2022 at 07:18:27AM -0700, Guenter Roeck wrote: >>>> NIP [c000000000031630] .replay_soft_interrupts+0x60/0x300 >>>> LR [c000000000031964] .arch_local_irq_restore+0x94/0x1c0 >>>> Call Trace: >>>> [c000000007df3870] [c000000000031964] .arch_local_irq_restore+0x94/0x1c0 (unreliable) >>>> [c000000007df38f0] [c000000000f8a444] .__schedule+0x664/0xa50 >>>> [c000000007df39d0] [c000000000f8a8b0] .schedule+0x80/0x140 >>>> [c000000007df3a50] [c00000000092f0dc] .try_to_generate_entropy+0x118/0x174 >>>> [c000000007df3b40] [c00000000092e2e4] .urandom_read_iter+0x74/0x140 >>>> [c000000007df3bc0] [c0000000003b0044] .vfs_read+0x284/0x2d0 >>>> [c000000007df3cd0] [c0000000003b0d2c] .ksys_read+0xdc/0x130 >>>> [c000000007df3d80] [c00000000002a88c] .system_call_exception+0x19c/0x330 >>>> [c000000007df3e10] [c00000000000c1d4] system_call_common+0xf4/0x258 >>> >>> Obviously the first couple lines of this concern me a bit. But I think >>> actually this might just be a catalyst for another bug. You could view >>> that function as basically just: >>> >>> while (something) >>> schedule(); >>> >>> And I guess in the process of calling the scheduler a lot, which toggles >>> interrupts a lot, something got wedged. >>> >>> Curious, though, I did try to reproduce this, to no avail. My .config is >>> https://xn--4db.cc/rBvHWfDZ . What's yours? >>> >> >> Attached. My qemu command line is > > Okay, thanks, I reproduced it. In this case, I suspect > try_to_generate_entropy() is just the messenger. There's an earlier > problem: > > BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1 > caller is .__flush_tlb_pending+0x40/0xf0 > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.0.0-28380-gde492c83cae0-dirty #4 > Hardware name: PowerMac3,1 PPC970FX 0x3c0301 PowerMac > Call Trace: > [c0000000044c3540] [c000000000f93ef0] .dump_stack_lvl+0x7c/0xc4 (unreliable) > [c0000000044c35d0] [c000000000fc9550] .check_preemption_disabled+0x140/0x150 > [c0000000044c3660] [c000000000073dd0] .__flush_tlb_pending+0x40/0xf0 > [c0000000044c36f0] [c000000000334434] .__apply_to_page_range+0x764/0xa30 > [c0000000044c3840] [c00000000006cad0] .change_memory_attr+0xf0/0x160 > [c0000000044c38d0] [c0000000002a1d70] .bpf_prog_select_runtime+0x150/0x230 > [c0000000044c3970] [c000000000d405d4] .bpf_prepare_filter+0x504/0x6f0 > [c0000000044c3a30] [c000000000d4085c] .bpf_prog_create+0x9c/0x140 > [c0000000044c3ac0] [c000000002051d9c] .ptp_classifier_init+0x44/0x78 > [c0000000044c3b50] [c000000002050f3c] .sock_init+0xe0/0x100 > [c0000000044c3bd0] [c000000000010bd4] .do_one_initcall+0xa4/0x438 > [c0000000044c3cc0] [c000000002005008] .kernel_init_freeable+0x378/0x428 > [c0000000044c3da0] [c0000000000113d8] .kernel_init+0x28/0x1a0 > [c0000000044c3e10] [c00000000000ca3c] .ret_from_kernel_thread+0x58/0x60 > > This in turn is because __flush_tlb_pending() calls: > > static inline int mm_is_thread_local(struct mm_struct *mm) > { > return cpumask_equal(mm_cpumask(mm), > cpumask_of(smp_processor_id())); > } > > __flush_tlb_pending() has a comment about this: > > * Must be called from within some kind of spinlock/non-preempt region... > */ > void __flush_tlb_pending(struct ppc64_tlb_batch *batch) > > So I guess that didn't happen for some reason? Maybe this is indicative > of some lock imbalance that then gets hit later? I managed to bisect that problem. Unfortunately it points to the scheduler merge. No idea what to do about that. Any idea ? I am copying Peter and Ingo for comments. Guenter --- # bad: [1440f576022887004f719883acb094e7e0dd4944] Merge tag 'mm-hotfixes-stable-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm # good: [4fe89d07dcc2804c8b562f6c7896a45643d34b2f] Linux 6.0 git bisect start 'HEAD' 'v6.0' # good: [7171a8da00035e7913c3013ca5fb5beb5b8b22f0] Merge tag 'arm-dt-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc git bisect good 7171a8da00035e7913c3013ca5fb5beb5b8b22f0 # good: [f01603979a4afaad7504a728918b678d572cda9e] Merge tag 'gpio-updates-for-v6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux git bisect good f01603979a4afaad7504a728918b678d572cda9e # bad: [8aeab132e05fefc3a1a5277878629586bd7a3547] Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost git bisect bad 8aeab132e05fefc3a1a5277878629586bd7a3547 # good: [493ffd6605b2d3d4dc7008ab927dba319f36671f] Merge tag 'ucount-rlimits-cleanups-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace git bisect good 493ffd6605b2d3d4dc7008ab927dba319f36671f # bad: [cdf072acb5baa18e5b05bdf3f13d6481f62396fc] Merge tag 'trace-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace git bisect bad cdf072acb5baa18e5b05bdf3f13d6481f62396fc # bad: [55be6084c8e0e0ada9278c2ab60b7a584378efda] Merge tag 'timers-core-2022-10-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip git bisect bad 55be6084c8e0e0ada9278c2ab60b7a584378efda # good: [82aad7ff7ac25c8cf09d491ae23b9823f1901486] perf/hw_breakpoint: Annotate tsk->perf_event_mutex vs ctx->mutex git bisect good 82aad7ff7ac25c8cf09d491ae23b9823f1901486 # bad: [3871d93b82a4a6c1f4308064f046a544f16ada21] Merge tag 'perf-core-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip git bisect bad 3871d93b82a4a6c1f4308064f046a544f16ada21 # good: [33f93525799fa3c841b2ba93a56b2bb32ab11dc9] sched/deadline: Move __dl_clear_params out of dl_bw lock git bisect good 33f93525799fa3c841b2ba93a56b2bb32ab11dc9 # good: [929659acea03db6411a32de9037abab9f856f586] sched/completion: Add wait_for_completion_state() git bisect good 929659acea03db6411a32de9037abab9f856f586 # good: [b0defa7ae03ecf91b8bfd10ede430cff12fcbd06] sched/fair: Make sure to try to detach at least one movable task git bisect good b0defa7ae03ecf91b8bfd10ede430cff12fcbd06 # good: [5aec788aeb8eb74282b75ac1b317beb0fbb69a42] sched: Fix TASK_state comparisons git bisect good 5aec788aeb8eb74282b75ac1b317beb0fbb69a42 # bad: [30c999937f69abf935b0228b8411713737377d9e] Merge tag 'sched-core-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip git bisect bad 30c999937f69abf935b0228b8411713737377d9e # good: [fdf756f7127185eeffe00e918e66dfee797f3625] sched: Fix more TASK_state comparisons git bisect good fdf756f7127185eeffe00e918e66dfee797f3625 # first bad commit: [30c999937f69abf935b0228b8411713737377d9e] Merge tag 'sched-core-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
On Thu Oct 13, 2022 at 10:21 AM AEST, Guenter Roeck wrote: > On Thu, Oct 13, 2022 at 11:03:34AM +1100, Michael Ellerman wrote: > > Guenter Roeck <linux@roeck-us.net> writes: > > > On Wed, Oct 12, 2022 at 11:20:38AM -0600, Jason A. Donenfeld wrote: > > >> > > >> I've also managed to not hit this bug a few times. When it triggers, > > >> after "kprobes: kprobe jump-optimization is enabled. All kprobes are > > >> optimized if possible.", there's a long hang - tens seconds before it > > >> continues. When it doesn't trigger, there's no hang at that point in the > > >> boot process. > > >> > > > > > > I managed to bisect the problem. See below for results. Reverting the > > > offending patch fixes the problem for me. > > > > Thanks. > > > > This is probably down to me/us not testing with PREEMPT enabled enough. > > > Not sure. My configuration has > > CONFIG_PREEMPT_NONE=y > # CONFIG_PREEMPT_VOLUNTARY is not set > # CONFIG_PREEMPT is not set Thanks very much for helping with this. The config snippet you posted here https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-October/249758.html has CONFIG_PREEMPT=y. How do you turn that into a .config, olddefconfig? I can't reproduce this so far using your config and qemu command line, but the patch you've bisected it to definitely could cause this. I'll keep trying... Thanks, Nick [...] > > > # first bad commit: [e485f6c751e0a969327336c635ca602feea117f0] powerpc/64/interrupt: Fix return to masked context after hard-mask irq becomes pending
On Thu Oct 13, 2022 at 2:43 PM AEST, Guenter Roeck wrote: > On 10/12/22 10:20, Jason A. Donenfeld wrote: > > On Wed, Oct 12, 2022 at 09:44:52AM -0700, Guenter Roeck wrote: > >> On Wed, Oct 12, 2022 at 09:49:26AM -0600, Jason A. Donenfeld wrote: > >>> On Wed, Oct 12, 2022 at 07:18:27AM -0700, Guenter Roeck wrote: > >>>> NIP [c000000000031630] .replay_soft_interrupts+0x60/0x300 > >>>> LR [c000000000031964] .arch_local_irq_restore+0x94/0x1c0 > >>>> Call Trace: > >>>> [c000000007df3870] [c000000000031964] .arch_local_irq_restore+0x94/0x1c0 (unreliable) > >>>> [c000000007df38f0] [c000000000f8a444] .__schedule+0x664/0xa50 > >>>> [c000000007df39d0] [c000000000f8a8b0] .schedule+0x80/0x140 > >>>> [c000000007df3a50] [c00000000092f0dc] .try_to_generate_entropy+0x118/0x174 > >>>> [c000000007df3b40] [c00000000092e2e4] .urandom_read_iter+0x74/0x140 > >>>> [c000000007df3bc0] [c0000000003b0044] .vfs_read+0x284/0x2d0 > >>>> [c000000007df3cd0] [c0000000003b0d2c] .ksys_read+0xdc/0x130 > >>>> [c000000007df3d80] [c00000000002a88c] .system_call_exception+0x19c/0x330 > >>>> [c000000007df3e10] [c00000000000c1d4] system_call_common+0xf4/0x258 > >>> > >>> Obviously the first couple lines of this concern me a bit. But I think > >>> actually this might just be a catalyst for another bug. You could view > >>> that function as basically just: > >>> > >>> while (something) > >>> schedule(); > >>> > >>> And I guess in the process of calling the scheduler a lot, which toggles > >>> interrupts a lot, something got wedged. > >>> > >>> Curious, though, I did try to reproduce this, to no avail. My .config is > >>> https://xn--4db.cc/rBvHWfDZ . What's yours? > >>> > >> > >> Attached. My qemu command line is > > > > Okay, thanks, I reproduced it. In this case, I suspect > > try_to_generate_entropy() is just the messenger. There's an earlier > > problem: > > > > BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1 > > caller is .__flush_tlb_pending+0x40/0xf0 > > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.0.0-28380-gde492c83cae0-dirty #4 > > Hardware name: PowerMac3,1 PPC970FX 0x3c0301 PowerMac > > Call Trace: > > [c0000000044c3540] [c000000000f93ef0] .dump_stack_lvl+0x7c/0xc4 (unreliable) > > [c0000000044c35d0] [c000000000fc9550] .check_preemption_disabled+0x140/0x150 > > [c0000000044c3660] [c000000000073dd0] .__flush_tlb_pending+0x40/0xf0 > > [c0000000044c36f0] [c000000000334434] .__apply_to_page_range+0x764/0xa30 > > [c0000000044c3840] [c00000000006cad0] .change_memory_attr+0xf0/0x160 > > [c0000000044c38d0] [c0000000002a1d70] .bpf_prog_select_runtime+0x150/0x230 > > [c0000000044c3970] [c000000000d405d4] .bpf_prepare_filter+0x504/0x6f0 > > [c0000000044c3a30] [c000000000d4085c] .bpf_prog_create+0x9c/0x140 > > [c0000000044c3ac0] [c000000002051d9c] .ptp_classifier_init+0x44/0x78 > > [c0000000044c3b50] [c000000002050f3c] .sock_init+0xe0/0x100 > > [c0000000044c3bd0] [c000000000010bd4] .do_one_initcall+0xa4/0x438 > > [c0000000044c3cc0] [c000000002005008] .kernel_init_freeable+0x378/0x428 > > [c0000000044c3da0] [c0000000000113d8] .kernel_init+0x28/0x1a0 > > [c0000000044c3e10] [c00000000000ca3c] .ret_from_kernel_thread+0x58/0x60 > > > > This in turn is because __flush_tlb_pending() calls: > > > > static inline int mm_is_thread_local(struct mm_struct *mm) > > { > > return cpumask_equal(mm_cpumask(mm), > > cpumask_of(smp_processor_id())); > > } > > > > __flush_tlb_pending() has a comment about this: > > > > * Must be called from within some kind of spinlock/non-preempt region... > > */ > > void __flush_tlb_pending(struct ppc64_tlb_batch *batch) > > > > So I guess that didn't happen for some reason? Maybe this is indicative > > of some lock imbalance that then gets hit later? > > I managed to bisect that problem. Unfortunately it points to the > scheduler merge. No idea what to do about that. Any idea ? > I am copying Peter and Ingo for comments. > > # first bad commit: [30c999937f69abf935b0228b8411713737377d9e] Merge tag 'sched-core-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip This might be a red herring because I can reproduce without it. I think we can fix this with some preempt critical sections, they don't look too much of a problem. I don't know why it's not showing up earlier than this release, I'll look into it a bit more. Thanks, Nick
On Thu Oct 13, 2022 at 4:37 AM AEST, Jason A. Donenfeld wrote: > On Wed, Oct 12, 2022 at 10:48:26AM -0700, Guenter Roeck wrote: > > > I've also managed to not hit this bug a few times. When it triggers, > > > after "kprobes: kprobe jump-optimization is enabled. All kprobes are > > > optimized if possible.", there's a long hang - tens seconds before it > > > continues. When it doesn't trigger, there's no hang at that point in the > > > boot process. > > > > > > > That probably explains why my attempts to bisect the problem were > > unsuccessful. > > So I just did this: > > diff --git a/drivers/char/random.c b/drivers/char/random.c > index 2fe28eeb2f38..2d70bc09db7e 100644 > --- a/drivers/char/random.c > +++ b/drivers/char/random.c > @@ -1212,6 +1212,7 @@ static void __cold try_to_generate_entropy(void) > struct entropy_timer_state stack; > unsigned int i, num_different = 0; > unsigned long last = random_get_entropy(); > + return; > > for (i = 0; i < NUM_TRIAL_SAMPLES - 1; ++i) { > stack.entropy = random_get_entropy(); > > And then ran it, and now we get the lockup from the idle process: Yep that rules out the random code. And really if it was calling schedule() it shouldn't be getting a softlockup anyway. Thanks, Nick
On Thu, Oct 13, 2022 at 03:03:14PM +1000, Nicholas Piggin wrote: > On Thu Oct 13, 2022 at 10:21 AM AEST, Guenter Roeck wrote: > > On Thu, Oct 13, 2022 at 11:03:34AM +1100, Michael Ellerman wrote: > > > Guenter Roeck <linux@roeck-us.net> writes: > > > > On Wed, Oct 12, 2022 at 11:20:38AM -0600, Jason A. Donenfeld wrote: > > > >> > > > >> I've also managed to not hit this bug a few times. When it triggers, > > > >> after "kprobes: kprobe jump-optimization is enabled. All kprobes are > > > >> optimized if possible.", there's a long hang - tens seconds before it > > > >> continues. When it doesn't trigger, there's no hang at that point in the > > > >> boot process. > > > >> > > > > > > > > I managed to bisect the problem. See below for results. Reverting the > > > > offending patch fixes the problem for me. > > > > > > Thanks. > > > > > > This is probably down to me/us not testing with PREEMPT enabled enough. > > > > > Not sure. My configuration has > > > > CONFIG_PREEMPT_NONE=y > > # CONFIG_PREEMPT_VOLUNTARY is not set > > # CONFIG_PREEMPT is not set > > Thanks very much for helping with this. The config snippet you posted here > https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-October/249758.html > has CONFIG_PREEMPT=y. How do you turn that into a .config, olddefconfig? > > I can't reproduce this so far using your config and qemu command line, > but the patch you've bisected it to definitely could cause this. I'll > keep trying... Voila https://xn--4db.cc/dt00j0mt this repros it for me. > > Thanks, > Nick > > [...] > > > > # first bad commit: [e485f6c751e0a969327336c635ca602feea117f0] powerpc/64/interrupt: Fix return to masked context after hard-mask irq becomes pending >
On 10/12/22 22:03, Nicholas Piggin wrote: > On Thu Oct 13, 2022 at 10:21 AM AEST, Guenter Roeck wrote: >> On Thu, Oct 13, 2022 at 11:03:34AM +1100, Michael Ellerman wrote: >>> Guenter Roeck <linux@roeck-us.net> writes: >>>> On Wed, Oct 12, 2022 at 11:20:38AM -0600, Jason A. Donenfeld wrote: >>>>> >>>>> I've also managed to not hit this bug a few times. When it triggers, >>>>> after "kprobes: kprobe jump-optimization is enabled. All kprobes are >>>>> optimized if possible.", there's a long hang - tens seconds before it >>>>> continues. When it doesn't trigger, there's no hang at that point in the >>>>> boot process. >>>>> >>>> >>>> I managed to bisect the problem. See below for results. Reverting the >>>> offending patch fixes the problem for me. >>> >>> Thanks. >>> >>> This is probably down to me/us not testing with PREEMPT enabled enough. >>> >> Not sure. My configuration has >> >> CONFIG_PREEMPT_NONE=y >> # CONFIG_PREEMPT_VOLUNTARY is not set >> # CONFIG_PREEMPT is not set > > Thanks very much for helping with this. The config snippet you posted here > https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-October/249758.html > has CONFIG_PREEMPT=y. How do you turn that into a .config, olddefconfig? > > I can't reproduce this so far using your config and qemu command line, > but the patch you've bisected it to definitely could cause this. I'll > keep trying... > Uuh, sorry, I think I got confused with running multiple bisects on the same branch, and took the above from a different bisect run. You are correct, PREEMPT is enabled in the configuration. Timing is definitely involved; I see the problem more often on a loaded system. To bisect it, I had to repeat the test for each bisect step several times (I set the limit to 20 retries; that gave me reliable results). Guenter
On Thu Oct 13, 2022 at 10:21 AM AEST, Guenter Roeck wrote: > On Thu, Oct 13, 2022 at 11:03:34AM +1100, Michael Ellerman wrote: > > Guenter Roeck <linux@roeck-us.net> writes: > > > On Wed, Oct 12, 2022 at 11:20:38AM -0600, Jason A. Donenfeld wrote: > > >> > > >> I've also managed to not hit this bug a few times. When it triggers, > > >> after "kprobes: kprobe jump-optimization is enabled. All kprobes are > > >> optimized if possible.", there's a long hang - tens seconds before it > > >> continues. When it doesn't trigger, there's no hang at that point in the > > >> boot process. > > >> > > > > > > I managed to bisect the problem. See below for results. Reverting the > > > offending patch fixes the problem for me. > > > > Thanks. > > > > This is probably down to me/us not testing with PREEMPT enabled enough. > > > Not sure. My configuration has > > CONFIG_PREEMPT_NONE=y > # CONFIG_PREEMPT_VOLUNTARY is not set > # CONFIG_PREEMPT is not set Okay I reproduced it, just takes a while to hit. Thanks, Nick
On Thu, Oct 13, 2022 at 03:14:08PM +1000, Nicholas Piggin wrote: > > > > > > BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1 > > > caller is .__flush_tlb_pending+0x40/0xf0 > > > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.0.0-28380-gde492c83cae0-dirty #4 > > > Hardware name: PowerMac3,1 PPC970FX 0x3c0301 PowerMac > > > Call Trace: > > > [c0000000044c3540] [c000000000f93ef0] .dump_stack_lvl+0x7c/0xc4 (unreliable) > > > [c0000000044c35d0] [c000000000fc9550] .check_preemption_disabled+0x140/0x150 > > > [c0000000044c3660] [c000000000073dd0] .__flush_tlb_pending+0x40/0xf0 > > > [c0000000044c36f0] [c000000000334434] .__apply_to_page_range+0x764/0xa30 > > > [c0000000044c3840] [c00000000006cad0] .change_memory_attr+0xf0/0x160 > > > [c0000000044c38d0] [c0000000002a1d70] .bpf_prog_select_runtime+0x150/0x230 > > > [c0000000044c3970] [c000000000d405d4] .bpf_prepare_filter+0x504/0x6f0 > > > [c0000000044c3a30] [c000000000d4085c] .bpf_prog_create+0x9c/0x140 > > > [c0000000044c3ac0] [c000000002051d9c] .ptp_classifier_init+0x44/0x78 > > > [c0000000044c3b50] [c000000002050f3c] .sock_init+0xe0/0x100 > > > [c0000000044c3bd0] [c000000000010bd4] .do_one_initcall+0xa4/0x438 > > > [c0000000044c3cc0] [c000000002005008] .kernel_init_freeable+0x378/0x428 > > > [c0000000044c3da0] [c0000000000113d8] .kernel_init+0x28/0x1a0 > > > [c0000000044c3e10] [c00000000000ca3c] .ret_from_kernel_thread+0x58/0x60 > > > > > > This in turn is because __flush_tlb_pending() calls: > > > > > > static inline int mm_is_thread_local(struct mm_struct *mm) > > > { > > > return cpumask_equal(mm_cpumask(mm), > > > cpumask_of(smp_processor_id())); > > > } > > > > > > __flush_tlb_pending() has a comment about this: > > > > > > * Must be called from within some kind of spinlock/non-preempt region... > > > */ > > > void __flush_tlb_pending(struct ppc64_tlb_batch *batch) > > > > > > So I guess that didn't happen for some reason? Maybe this is indicative > > > of some lock imbalance that then gets hit later? > > > > I managed to bisect that problem. Unfortunately it points to the > > scheduler merge. No idea what to do about that. Any idea ? > > I am copying Peter and Ingo for comments. > > > > > # first bad commit: [30c999937f69abf935b0228b8411713737377d9e] Merge tag 'sched-core-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip > > This might be a red herring because I can reproduce without it. > I think we can fix this with some preempt critical sections, they > don't look too much of a problem. > Do you refer to the bisect of the BUG: message above, or to the other problem ? I can try to repeat the bisect with some retries if you think that 30c999937f69a isn't responsible for "BUG: using smp_processor_id() in preemptible [00000000] code". Thanks, Guenter