Message ID | 20210723181453.18987-1-gpiccoli@canonical.com |
---|---|
Headers | show |
Series | Kdump freezes on 4.15 kernel in Azure | expand |
Acked-by: Tim Gardner <tim.gardner@canonical.com> The target kernel is really bionic:linux-azure-4.15 On 7/23/21 12:14 PM, Guilherme G. Piccoli wrote: > BugLink: https://bugs.launchpad.net/bugs/1882623 > > [Impact] > > * When kdumping on trusty/4.15 in an Azure instance, we observe quite frequently > a stall on the kdump kernel, it gets blocked and soon we see a stack like > the following: > > [ 65.452007] INFO: rcu_sched detected stalls on CPUs/tasks: > [ 65.456004] 1-...!: (0 ticks this GP) idle=488/0/0 softirq=1/1 fqs=0 > [ 65.456004] (detected by 0, t=15002 jiffies, g=707, c=706, q=8457) > [ 65.456004] rcu_sched kthread starved for 15002 jiffies! g707 c706 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1 > > * By using the Azure serial console, we collected a sysrq-w when the issue happens: > [ 529.515013] sysrq: Show Blocked State > [ 529.517730] task PC stack pid father > [ 529.519006] kworker/u4:2 D 0 94 2 0x80000000 > [ 529.519006] Workqueue: events_unbound fsnotify_mark_destroy_workfn > [ 529.519006] Call Trace: > [ 529.519006] __schedule+0x292/0x880 > [ 529.519006] schedule+0x36/0x80 > [ 529.519006] schedule_timeout+0x1d5/0x2f0 > [ 529.519006] ? check_preempt_wakeup+0x162/0x260 > [ 529.519006] wait_for_completion+0xa5/0x110 > [ 529.519006] ? wake_up_q+0x80/0x80 > [ 529.519006] __synchronize_srcu.part.14+0x67/0x80 > [ 529.519006] ? trace_raw_output_rcu_utilization+0x50/0x50 > [ 529.519006] ? __switch_to_asm+0x41/0x70 > [ 529.519006] synchronize_srcu+0xd1/0xd6 > [ 529.519006] fsnotify_mark_destroy_workfn+0x6d/0xc0 > [ 529.519006] process_one_work+0x14e/0x390 > [ 529.519006] worker_thread+0x1cc/0x3d0 > [ 529.519006] kthread+0x105/0x140 > [ 529.519006] ? max_active_store+0x60/0x60 > [ 529.519006] ? kthread_bind+0x20/0x20 > [ 529.519006] ret_from_fork+0x35/0x40 > [ 529.519006] udevadm D 0 544 1 0x00000000 > [ 529.519006] Call Trace: > [ 529.519006] __schedule+0x292/0x880 > [ 529.519006] schedule+0x36/0x80 > [ 529.519006] schedule_timeout+0x1d5/0x2f0 > [ 529.519006] ? try_to_wake_up+0x4a/0x460 > [ 529.519006] ? try_to_wake_up+0x4a/0x460 > [ 529.519006] wait_for_completion+0xa5/0x110 > [ 529.519006] ? wake_up_q+0x80/0x80 > [ 529.519006] __flush_work.isra.29+0x119/0x1b0 > [ 529.519006] ? destroy_worker+0x90/0x90 > [ 529.519006] flush_delayed_work+0x3f/0x50 > [ 529.519006] fsnotify_wait_marks_destroyed+0x15/0x20 > [ 529.519006] fsnotify_destroy_group+0x4e/0xc0 > [ 529.519006] inotify_release+0x1e/0x50 > [ 529.519006] __fput+0xea/0x220 > [ 529.519006] ____fput+0xe/0x10 > [ 529.519006] task_work_run+0x8c/0xb0 > [ 529.519006] exit_to_usermode_loop+0x70/0xa9 > [ 529.519006] do_syscall_64+0x1b5/0x1e0 > [ 529.519006] entry_SYSCALL_64_after_hwframe+0x41/0xa6 > [ 529.519006] dhclient D 0 573 572 0x00000000 > [ 529.519006] Call Trace: > [ 529.519006] __schedule+0x292/0x880 > [ 529.519006] schedule+0x36/0x80 > [ 529.519006] schedule_timeout+0x1d5/0x2f0 > [ 529.519006] ? aa_profile_af_perm+0xb4/0xf0 > [ 529.519006] wait_for_completion+0xa5/0x110 > [ 529.519006] ? wake_up_q+0x80/0x80 > [ 529.519006] __wait_rcu_gp+0x123/0x150 > [ 529.519006] synchronize_sched+0x4e/0x60 > [ 529.519006] ? __call_rcu+0x2f0/0x2f0 > [ 529.519006] ? trace_raw_output_rcu_utilization+0x50/0x50 > [ 529.519006] synchronize_net+0x1c/0x30 > [ 529.519006] __unregister_prot_hook+0xcd/0xf0 > [ 529.519006] packet_do_bind+0x1bd/0x250 > [ 529.519006] packet_bind+0x2f/0x50 > [ 529.519006] SYSC_bind+0xd8/0x110 > [ 529.519006] ? sock_alloc_file+0x91/0x130 > [ 529.519006] SyS_bind+0xe/0x10 > [ 529.519006] do_syscall_64+0x80/0x1e0 > [ 529.519006] entry_SYSCALL_64_after_hwframe+0x41/0xa6 > > * Bisecting mainline kernels, we found that v4.17-rc1 didn't reproduce the issue, > whereas v4.16 reproduced. Then, a fine-grained git bisect led us to the fix - the > following patch, when backported to a problematic version, fixes the issue: > d8e462e19305 ("Drivers: hv: vmbus: Implement Direct Mode for stimer0") > > * In Azure/Hyper-V, before the aforementioned commit, timer interrupts were passed > to the hypervisor through a vmbus message, a mechanism of communication of hyper-v > guests/hypervisor. With the patch, a check is made (through MSR-like mechanism) > and if the hypervisor supports, a direct timer IRQ mechanism is put in-place > instead of the vmbus channel. > > * Our theory is that on kdump kernel, specially due to the single cpu nature, > the vmbus-messaged timer IRQ could interfere with scheduling and create a dead-lock > condition, which is what we observe from the stack traces. Hence, we hereby > propose to backport such patch to azure 4.15 kernels. > > * Our hypothesis about why this wasn't observed in Bionic is that changes in the > kdump mechanism make it start earlier, and this might hide the schedule issue. > Lack of testing also could be the cause of the non-observability. > > [Test Case] > > * Perform a kdump in Trusty/4.15 and after some rounds, you should see the > mentioned RCU stack trace. With the patch, no issue is observed, and we could > perform ~30 kdumps in a row. > > [Where problems could occur] > > * Since the patch touches IRQ code (although restricted to Hyper-v), it's a > somewhat invasive change. It could lead to issues in the IRQ subsystem, although > the patch is upstream since 2018 and no fixes for that were found (we count on > Fixes tag for that). > > > Michael Kelley (1): > Drivers: hv: vmbus: Implement Direct Mode for stimer0 > > arch/x86/entry/entry_32.S | 3 ++ > arch/x86/entry/entry_64.S | 3 ++ > arch/x86/include/asm/hardirq.h | 3 ++ > arch/x86/include/asm/hyperv-tlfs.h | 3 ++ > arch/x86/include/asm/irq_vectors.h | 3 +- > arch/x86/include/asm/mshyperv.h | 13 +++++++ > arch/x86/kernel/cpu/mshyperv.c | 40 ++++++++++++++++++++ > arch/x86/kernel/irq.c | 7 ++++ > drivers/hv/hv.c | 59 ++++++++++++++++++++++++++++-- > drivers/hv/hyperv_vmbus.h | 4 +- > 10 files changed, 133 insertions(+), 5 deletions(-) >
On Wed, Jul 28, 2021 at 8:53 AM Tim Gardner <tim.gardner@canonical.com> wrote: > > Acked-by: Tim Gardner <tim.gardner@canonical.com> > > The target kernel is really bionic:linux-azure-4.15 Agreed! And all 4.15 derivatives, like Xenial and (especially) Trusty. Thanks for the ACK Krzysztof and Tim. Cheers, Guilherme
Applied to B/azure-4.15 master-next. Thank you! -Kelsey On 2021-07-23 15:14:52 , Guilherme G. Piccoli wrote: > BugLink: https://bugs.launchpad.net/bugs/1882623 > > [Impact] > > * When kdumping on trusty/4.15 in an Azure instance, we observe quite frequently > a stall on the kdump kernel, it gets blocked and soon we see a stack like > the following: > > [ 65.452007] INFO: rcu_sched detected stalls on CPUs/tasks: > [ 65.456004] 1-...!: (0 ticks this GP) idle=488/0/0 softirq=1/1 fqs=0 > [ 65.456004] (detected by 0, t=15002 jiffies, g=707, c=706, q=8457) > [ 65.456004] rcu_sched kthread starved for 15002 jiffies! g707 c706 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1 > > * By using the Azure serial console, we collected a sysrq-w when the issue happens: > [ 529.515013] sysrq: Show Blocked State > [ 529.517730] task PC stack pid father > [ 529.519006] kworker/u4:2 D 0 94 2 0x80000000 > [ 529.519006] Workqueue: events_unbound fsnotify_mark_destroy_workfn > [ 529.519006] Call Trace: > [ 529.519006] __schedule+0x292/0x880 > [ 529.519006] schedule+0x36/0x80 > [ 529.519006] schedule_timeout+0x1d5/0x2f0 > [ 529.519006] ? check_preempt_wakeup+0x162/0x260 > [ 529.519006] wait_for_completion+0xa5/0x110 > [ 529.519006] ? wake_up_q+0x80/0x80 > [ 529.519006] __synchronize_srcu.part.14+0x67/0x80 > [ 529.519006] ? trace_raw_output_rcu_utilization+0x50/0x50 > [ 529.519006] ? __switch_to_asm+0x41/0x70 > [ 529.519006] synchronize_srcu+0xd1/0xd6 > [ 529.519006] fsnotify_mark_destroy_workfn+0x6d/0xc0 > [ 529.519006] process_one_work+0x14e/0x390 > [ 529.519006] worker_thread+0x1cc/0x3d0 > [ 529.519006] kthread+0x105/0x140 > [ 529.519006] ? max_active_store+0x60/0x60 > [ 529.519006] ? kthread_bind+0x20/0x20 > [ 529.519006] ret_from_fork+0x35/0x40 > [ 529.519006] udevadm D 0 544 1 0x00000000 > [ 529.519006] Call Trace: > [ 529.519006] __schedule+0x292/0x880 > [ 529.519006] schedule+0x36/0x80 > [ 529.519006] schedule_timeout+0x1d5/0x2f0 > [ 529.519006] ? try_to_wake_up+0x4a/0x460 > [ 529.519006] ? try_to_wake_up+0x4a/0x460 > [ 529.519006] wait_for_completion+0xa5/0x110 > [ 529.519006] ? wake_up_q+0x80/0x80 > [ 529.519006] __flush_work.isra.29+0x119/0x1b0 > [ 529.519006] ? destroy_worker+0x90/0x90 > [ 529.519006] flush_delayed_work+0x3f/0x50 > [ 529.519006] fsnotify_wait_marks_destroyed+0x15/0x20 > [ 529.519006] fsnotify_destroy_group+0x4e/0xc0 > [ 529.519006] inotify_release+0x1e/0x50 > [ 529.519006] __fput+0xea/0x220 > [ 529.519006] ____fput+0xe/0x10 > [ 529.519006] task_work_run+0x8c/0xb0 > [ 529.519006] exit_to_usermode_loop+0x70/0xa9 > [ 529.519006] do_syscall_64+0x1b5/0x1e0 > [ 529.519006] entry_SYSCALL_64_after_hwframe+0x41/0xa6 > [ 529.519006] dhclient D 0 573 572 0x00000000 > [ 529.519006] Call Trace: > [ 529.519006] __schedule+0x292/0x880 > [ 529.519006] schedule+0x36/0x80 > [ 529.519006] schedule_timeout+0x1d5/0x2f0 > [ 529.519006] ? aa_profile_af_perm+0xb4/0xf0 > [ 529.519006] wait_for_completion+0xa5/0x110 > [ 529.519006] ? wake_up_q+0x80/0x80 > [ 529.519006] __wait_rcu_gp+0x123/0x150 > [ 529.519006] synchronize_sched+0x4e/0x60 > [ 529.519006] ? __call_rcu+0x2f0/0x2f0 > [ 529.519006] ? trace_raw_output_rcu_utilization+0x50/0x50 > [ 529.519006] synchronize_net+0x1c/0x30 > [ 529.519006] __unregister_prot_hook+0xcd/0xf0 > [ 529.519006] packet_do_bind+0x1bd/0x250 > [ 529.519006] packet_bind+0x2f/0x50 > [ 529.519006] SYSC_bind+0xd8/0x110 > [ 529.519006] ? sock_alloc_file+0x91/0x130 > [ 529.519006] SyS_bind+0xe/0x10 > [ 529.519006] do_syscall_64+0x80/0x1e0 > [ 529.519006] entry_SYSCALL_64_after_hwframe+0x41/0xa6 > > * Bisecting mainline kernels, we found that v4.17-rc1 didn't reproduce the issue, > whereas v4.16 reproduced. Then, a fine-grained git bisect led us to the fix - the > following patch, when backported to a problematic version, fixes the issue: > d8e462e19305 ("Drivers: hv: vmbus: Implement Direct Mode for stimer0") > > * In Azure/Hyper-V, before the aforementioned commit, timer interrupts were passed > to the hypervisor through a vmbus message, a mechanism of communication of hyper-v > guests/hypervisor. With the patch, a check is made (through MSR-like mechanism) > and if the hypervisor supports, a direct timer IRQ mechanism is put in-place > instead of the vmbus channel. > > * Our theory is that on kdump kernel, specially due to the single cpu nature, > the vmbus-messaged timer IRQ could interfere with scheduling and create a dead-lock > condition, which is what we observe from the stack traces. Hence, we hereby > propose to backport such patch to azure 4.15 kernels. > > * Our hypothesis about why this wasn't observed in Bionic is that changes in the > kdump mechanism make it start earlier, and this might hide the schedule issue. > Lack of testing also could be the cause of the non-observability. > > [Test Case] > > * Perform a kdump in Trusty/4.15 and after some rounds, you should see the > mentioned RCU stack trace. With the patch, no issue is observed, and we could > perform ~30 kdumps in a row. > > [Where problems could occur] > > * Since the patch touches IRQ code (although restricted to Hyper-v), it's a > somewhat invasive change. It could lead to issues in the IRQ subsystem, although > the patch is upstream since 2018 and no fixes for that were found (we count on > Fixes tag for that). > > > Michael Kelley (1): > Drivers: hv: vmbus: Implement Direct Mode for stimer0 > > arch/x86/entry/entry_32.S | 3 ++ > arch/x86/entry/entry_64.S | 3 ++ > arch/x86/include/asm/hardirq.h | 3 ++ > arch/x86/include/asm/hyperv-tlfs.h | 3 ++ > arch/x86/include/asm/irq_vectors.h | 3 +- > arch/x86/include/asm/mshyperv.h | 13 +++++++ > arch/x86/kernel/cpu/mshyperv.c | 40 ++++++++++++++++++++ > arch/x86/kernel/irq.c | 7 ++++ > drivers/hv/hv.c | 59 ++++++++++++++++++++++++++++-- > drivers/hv/hyperv_vmbus.h | 4 +- > 10 files changed, 133 insertions(+), 5 deletions(-) > > -- > 2.31.1 > > > -- > kernel-team mailing list > kernel-team@lists.ubuntu.com > https://lists.ubuntu.com/mailman/listinfo/kernel-team