mbox series

[bionic:linux-azure,0/1] Kdump freezes on 4.15 kernel in Azure

Message ID 20210723181453.18987-1-gpiccoli@canonical.com
Headers show
Series Kdump freezes on 4.15 kernel in Azure | expand

Message

Guilherme G. Piccoli July 23, 2021, 6:14 p.m. UTC
BugLink: https://bugs.launchpad.net/bugs/1882623

[Impact]

* When kdumping on trusty/4.15 in an Azure instance, we observe quite frequently
a stall on the kdump kernel, it gets blocked and soon we see a stack like
the following:

[ 65.452007] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 65.456004] 1-...!: (0 ticks this GP) idle=488/0/0 softirq=1/1 fqs=0
[ 65.456004] (detected by 0, t=15002 jiffies, g=707, c=706, q=8457)
[ 65.456004] rcu_sched kthread starved for 15002 jiffies! g707 c706 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1

* By using the Azure serial console, we collected a sysrq-w when the issue happens:
[ 529.515013] sysrq: Show Blocked State
[ 529.517730] task PC stack pid father
[ 529.519006] kworker/u4:2 D 0 94 2 0x80000000
[ 529.519006] Workqueue: events_unbound fsnotify_mark_destroy_workfn
[ 529.519006] Call Trace:
[ 529.519006] __schedule+0x292/0x880
[ 529.519006] schedule+0x36/0x80
[ 529.519006] schedule_timeout+0x1d5/0x2f0
[ 529.519006] ? check_preempt_wakeup+0x162/0x260
[ 529.519006] wait_for_completion+0xa5/0x110
[ 529.519006] ? wake_up_q+0x80/0x80
[ 529.519006] __synchronize_srcu.part.14+0x67/0x80
[ 529.519006] ? trace_raw_output_rcu_utilization+0x50/0x50
[ 529.519006] ? __switch_to_asm+0x41/0x70
[ 529.519006] synchronize_srcu+0xd1/0xd6
[ 529.519006] fsnotify_mark_destroy_workfn+0x6d/0xc0
[ 529.519006] process_one_work+0x14e/0x390
[ 529.519006] worker_thread+0x1cc/0x3d0
[ 529.519006] kthread+0x105/0x140
[ 529.519006] ? max_active_store+0x60/0x60
[ 529.519006] ? kthread_bind+0x20/0x20
[ 529.519006] ret_from_fork+0x35/0x40
[ 529.519006] udevadm D 0 544 1 0x00000000
[ 529.519006] Call Trace:
[ 529.519006] __schedule+0x292/0x880
[ 529.519006] schedule+0x36/0x80
[ 529.519006] schedule_timeout+0x1d5/0x2f0
[ 529.519006] ? try_to_wake_up+0x4a/0x460
[ 529.519006] ? try_to_wake_up+0x4a/0x460
[ 529.519006] wait_for_completion+0xa5/0x110
[ 529.519006] ? wake_up_q+0x80/0x80
[ 529.519006] __flush_work.isra.29+0x119/0x1b0
[ 529.519006] ? destroy_worker+0x90/0x90
[ 529.519006] flush_delayed_work+0x3f/0x50
[ 529.519006] fsnotify_wait_marks_destroyed+0x15/0x20
[ 529.519006] fsnotify_destroy_group+0x4e/0xc0
[ 529.519006] inotify_release+0x1e/0x50
[ 529.519006] __fput+0xea/0x220
[ 529.519006] ____fput+0xe/0x10
[ 529.519006] task_work_run+0x8c/0xb0
[ 529.519006] exit_to_usermode_loop+0x70/0xa9
[ 529.519006] do_syscall_64+0x1b5/0x1e0
[ 529.519006] entry_SYSCALL_64_after_hwframe+0x41/0xa6
[ 529.519006] dhclient D 0 573 572 0x00000000
[ 529.519006] Call Trace:
[ 529.519006] __schedule+0x292/0x880
[ 529.519006] schedule+0x36/0x80
[ 529.519006] schedule_timeout+0x1d5/0x2f0
[ 529.519006] ? aa_profile_af_perm+0xb4/0xf0
[ 529.519006] wait_for_completion+0xa5/0x110
[ 529.519006] ? wake_up_q+0x80/0x80
[ 529.519006] __wait_rcu_gp+0x123/0x150
[ 529.519006] synchronize_sched+0x4e/0x60
[ 529.519006] ? __call_rcu+0x2f0/0x2f0
[ 529.519006] ? trace_raw_output_rcu_utilization+0x50/0x50
[ 529.519006] synchronize_net+0x1c/0x30
[ 529.519006] __unregister_prot_hook+0xcd/0xf0
[ 529.519006] packet_do_bind+0x1bd/0x250
[ 529.519006] packet_bind+0x2f/0x50
[ 529.519006] SYSC_bind+0xd8/0x110
[ 529.519006] ? sock_alloc_file+0x91/0x130
[ 529.519006] SyS_bind+0xe/0x10
[ 529.519006] do_syscall_64+0x80/0x1e0
[ 529.519006] entry_SYSCALL_64_after_hwframe+0x41/0xa6

* Bisecting mainline kernels, we found that v4.17-rc1 didn't reproduce the issue,
whereas v4.16 reproduced. Then, a fine-grained git bisect led us to the fix - the
following patch, when backported to a problematic version, fixes the issue:
d8e462e19305 ("Drivers: hv: vmbus: Implement Direct Mode for stimer0")

* In Azure/Hyper-V, before the aforementioned commit, timer interrupts were passed
to the hypervisor through a vmbus message, a mechanism of communication of hyper-v
guests/hypervisor. With the patch, a check is made (through MSR-like mechanism)
and if the hypervisor supports, a direct timer IRQ mechanism is put in-place
instead of the vmbus channel.

* Our theory is that on kdump kernel, specially due to the single cpu nature,
the vmbus-messaged timer IRQ could interfere with scheduling and create a dead-lock
condition, which is what we observe from the stack traces. Hence, we hereby
propose to backport such patch to azure 4.15 kernels.

* Our hypothesis about why this wasn't observed in Bionic is that changes in the
kdump mechanism make it start earlier, and this might hide the schedule issue.
Lack of testing also could be the cause of the non-observability.

[Test Case]

* Perform a kdump in Trusty/4.15 and after some rounds, you should see the
mentioned RCU stack trace. With the patch, no issue is observed, and we could
perform ~30 kdumps in a row.

[Where problems could occur]

* Since the patch touches IRQ code (although restricted to Hyper-v), it's a
somewhat invasive change. It could lead to issues in the IRQ subsystem, although
the patch is upstream since 2018 and no fixes for that were found (we count on
Fixes tag for that).


Michael Kelley (1):
  Drivers: hv: vmbus: Implement Direct Mode for stimer0

 arch/x86/entry/entry_32.S          |  3 ++
 arch/x86/entry/entry_64.S          |  3 ++
 arch/x86/include/asm/hardirq.h     |  3 ++
 arch/x86/include/asm/hyperv-tlfs.h |  3 ++
 arch/x86/include/asm/irq_vectors.h |  3 +-
 arch/x86/include/asm/mshyperv.h    | 13 +++++++
 arch/x86/kernel/cpu/mshyperv.c     | 40 ++++++++++++++++++++
 arch/x86/kernel/irq.c              |  7 ++++
 drivers/hv/hv.c                    | 59 ++++++++++++++++++++++++++++--
 drivers/hv/hyperv_vmbus.h          |  4 +-
 10 files changed, 133 insertions(+), 5 deletions(-)

Comments

Tim Gardner July 28, 2021, 11:53 a.m. UTC | #1
Acked-by: Tim Gardner <tim.gardner@canonical.com>

The target kernel is really bionic:linux-azure-4.15

On 7/23/21 12:14 PM, Guilherme G. Piccoli wrote:
> BugLink: https://bugs.launchpad.net/bugs/1882623
> 
> [Impact]
> 
> * When kdumping on trusty/4.15 in an Azure instance, we observe quite frequently
> a stall on the kdump kernel, it gets blocked and soon we see a stack like
> the following:
> 
> [ 65.452007] INFO: rcu_sched detected stalls on CPUs/tasks:
> [ 65.456004] 1-...!: (0 ticks this GP) idle=488/0/0 softirq=1/1 fqs=0
> [ 65.456004] (detected by 0, t=15002 jiffies, g=707, c=706, q=8457)
> [ 65.456004] rcu_sched kthread starved for 15002 jiffies! g707 c706 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1
> 
> * By using the Azure serial console, we collected a sysrq-w when the issue happens:
> [ 529.515013] sysrq: Show Blocked State
> [ 529.517730] task PC stack pid father
> [ 529.519006] kworker/u4:2 D 0 94 2 0x80000000
> [ 529.519006] Workqueue: events_unbound fsnotify_mark_destroy_workfn
> [ 529.519006] Call Trace:
> [ 529.519006] __schedule+0x292/0x880
> [ 529.519006] schedule+0x36/0x80
> [ 529.519006] schedule_timeout+0x1d5/0x2f0
> [ 529.519006] ? check_preempt_wakeup+0x162/0x260
> [ 529.519006] wait_for_completion+0xa5/0x110
> [ 529.519006] ? wake_up_q+0x80/0x80
> [ 529.519006] __synchronize_srcu.part.14+0x67/0x80
> [ 529.519006] ? trace_raw_output_rcu_utilization+0x50/0x50
> [ 529.519006] ? __switch_to_asm+0x41/0x70
> [ 529.519006] synchronize_srcu+0xd1/0xd6
> [ 529.519006] fsnotify_mark_destroy_workfn+0x6d/0xc0
> [ 529.519006] process_one_work+0x14e/0x390
> [ 529.519006] worker_thread+0x1cc/0x3d0
> [ 529.519006] kthread+0x105/0x140
> [ 529.519006] ? max_active_store+0x60/0x60
> [ 529.519006] ? kthread_bind+0x20/0x20
> [ 529.519006] ret_from_fork+0x35/0x40
> [ 529.519006] udevadm D 0 544 1 0x00000000
> [ 529.519006] Call Trace:
> [ 529.519006] __schedule+0x292/0x880
> [ 529.519006] schedule+0x36/0x80
> [ 529.519006] schedule_timeout+0x1d5/0x2f0
> [ 529.519006] ? try_to_wake_up+0x4a/0x460
> [ 529.519006] ? try_to_wake_up+0x4a/0x460
> [ 529.519006] wait_for_completion+0xa5/0x110
> [ 529.519006] ? wake_up_q+0x80/0x80
> [ 529.519006] __flush_work.isra.29+0x119/0x1b0
> [ 529.519006] ? destroy_worker+0x90/0x90
> [ 529.519006] flush_delayed_work+0x3f/0x50
> [ 529.519006] fsnotify_wait_marks_destroyed+0x15/0x20
> [ 529.519006] fsnotify_destroy_group+0x4e/0xc0
> [ 529.519006] inotify_release+0x1e/0x50
> [ 529.519006] __fput+0xea/0x220
> [ 529.519006] ____fput+0xe/0x10
> [ 529.519006] task_work_run+0x8c/0xb0
> [ 529.519006] exit_to_usermode_loop+0x70/0xa9
> [ 529.519006] do_syscall_64+0x1b5/0x1e0
> [ 529.519006] entry_SYSCALL_64_after_hwframe+0x41/0xa6
> [ 529.519006] dhclient D 0 573 572 0x00000000
> [ 529.519006] Call Trace:
> [ 529.519006] __schedule+0x292/0x880
> [ 529.519006] schedule+0x36/0x80
> [ 529.519006] schedule_timeout+0x1d5/0x2f0
> [ 529.519006] ? aa_profile_af_perm+0xb4/0xf0
> [ 529.519006] wait_for_completion+0xa5/0x110
> [ 529.519006] ? wake_up_q+0x80/0x80
> [ 529.519006] __wait_rcu_gp+0x123/0x150
> [ 529.519006] synchronize_sched+0x4e/0x60
> [ 529.519006] ? __call_rcu+0x2f0/0x2f0
> [ 529.519006] ? trace_raw_output_rcu_utilization+0x50/0x50
> [ 529.519006] synchronize_net+0x1c/0x30
> [ 529.519006] __unregister_prot_hook+0xcd/0xf0
> [ 529.519006] packet_do_bind+0x1bd/0x250
> [ 529.519006] packet_bind+0x2f/0x50
> [ 529.519006] SYSC_bind+0xd8/0x110
> [ 529.519006] ? sock_alloc_file+0x91/0x130
> [ 529.519006] SyS_bind+0xe/0x10
> [ 529.519006] do_syscall_64+0x80/0x1e0
> [ 529.519006] entry_SYSCALL_64_after_hwframe+0x41/0xa6
> 
> * Bisecting mainline kernels, we found that v4.17-rc1 didn't reproduce the issue,
> whereas v4.16 reproduced. Then, a fine-grained git bisect led us to the fix - the
> following patch, when backported to a problematic version, fixes the issue:
> d8e462e19305 ("Drivers: hv: vmbus: Implement Direct Mode for stimer0")
> 
> * In Azure/Hyper-V, before the aforementioned commit, timer interrupts were passed
> to the hypervisor through a vmbus message, a mechanism of communication of hyper-v
> guests/hypervisor. With the patch, a check is made (through MSR-like mechanism)
> and if the hypervisor supports, a direct timer IRQ mechanism is put in-place
> instead of the vmbus channel.
> 
> * Our theory is that on kdump kernel, specially due to the single cpu nature,
> the vmbus-messaged timer IRQ could interfere with scheduling and create a dead-lock
> condition, which is what we observe from the stack traces. Hence, we hereby
> propose to backport such patch to azure 4.15 kernels.
> 
> * Our hypothesis about why this wasn't observed in Bionic is that changes in the
> kdump mechanism make it start earlier, and this might hide the schedule issue.
> Lack of testing also could be the cause of the non-observability.
> 
> [Test Case]
> 
> * Perform a kdump in Trusty/4.15 and after some rounds, you should see the
> mentioned RCU stack trace. With the patch, no issue is observed, and we could
> perform ~30 kdumps in a row.
> 
> [Where problems could occur]
> 
> * Since the patch touches IRQ code (although restricted to Hyper-v), it's a
> somewhat invasive change. It could lead to issues in the IRQ subsystem, although
> the patch is upstream since 2018 and no fixes for that were found (we count on
> Fixes tag for that).
> 
> 
> Michael Kelley (1):
>    Drivers: hv: vmbus: Implement Direct Mode for stimer0
> 
>   arch/x86/entry/entry_32.S          |  3 ++
>   arch/x86/entry/entry_64.S          |  3 ++
>   arch/x86/include/asm/hardirq.h     |  3 ++
>   arch/x86/include/asm/hyperv-tlfs.h |  3 ++
>   arch/x86/include/asm/irq_vectors.h |  3 +-
>   arch/x86/include/asm/mshyperv.h    | 13 +++++++
>   arch/x86/kernel/cpu/mshyperv.c     | 40 ++++++++++++++++++++
>   arch/x86/kernel/irq.c              |  7 ++++
>   drivers/hv/hv.c                    | 59 ++++++++++++++++++++++++++++--
>   drivers/hv/hyperv_vmbus.h          |  4 +-
>   10 files changed, 133 insertions(+), 5 deletions(-)
>
Guilherme G. Piccoli July 28, 2021, 6:59 p.m. UTC | #2
On Wed, Jul 28, 2021 at 8:53 AM Tim Gardner <tim.gardner@canonical.com> wrote:
>
> Acked-by: Tim Gardner <tim.gardner@canonical.com>
>
> The target kernel is really bionic:linux-azure-4.15

Agreed! And all 4.15 derivatives, like Xenial and (especially) Trusty.
Thanks for the ACK Krzysztof and Tim.
Cheers,


Guilherme
Kelsey Skunberg Aug. 21, 2021, 2:26 a.m. UTC | #3
Applied to B/azure-4.15 master-next. Thank you!

-Kelsey

On 2021-07-23 15:14:52 , Guilherme G. Piccoli wrote:
> BugLink: https://bugs.launchpad.net/bugs/1882623
> 
> [Impact]
> 
> * When kdumping on trusty/4.15 in an Azure instance, we observe quite frequently
> a stall on the kdump kernel, it gets blocked and soon we see a stack like
> the following:
> 
> [ 65.452007] INFO: rcu_sched detected stalls on CPUs/tasks:
> [ 65.456004] 1-...!: (0 ticks this GP) idle=488/0/0 softirq=1/1 fqs=0
> [ 65.456004] (detected by 0, t=15002 jiffies, g=707, c=706, q=8457)
> [ 65.456004] rcu_sched kthread starved for 15002 jiffies! g707 c706 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1
> 
> * By using the Azure serial console, we collected a sysrq-w when the issue happens:
> [ 529.515013] sysrq: Show Blocked State
> [ 529.517730] task PC stack pid father
> [ 529.519006] kworker/u4:2 D 0 94 2 0x80000000
> [ 529.519006] Workqueue: events_unbound fsnotify_mark_destroy_workfn
> [ 529.519006] Call Trace:
> [ 529.519006] __schedule+0x292/0x880
> [ 529.519006] schedule+0x36/0x80
> [ 529.519006] schedule_timeout+0x1d5/0x2f0
> [ 529.519006] ? check_preempt_wakeup+0x162/0x260
> [ 529.519006] wait_for_completion+0xa5/0x110
> [ 529.519006] ? wake_up_q+0x80/0x80
> [ 529.519006] __synchronize_srcu.part.14+0x67/0x80
> [ 529.519006] ? trace_raw_output_rcu_utilization+0x50/0x50
> [ 529.519006] ? __switch_to_asm+0x41/0x70
> [ 529.519006] synchronize_srcu+0xd1/0xd6
> [ 529.519006] fsnotify_mark_destroy_workfn+0x6d/0xc0
> [ 529.519006] process_one_work+0x14e/0x390
> [ 529.519006] worker_thread+0x1cc/0x3d0
> [ 529.519006] kthread+0x105/0x140
> [ 529.519006] ? max_active_store+0x60/0x60
> [ 529.519006] ? kthread_bind+0x20/0x20
> [ 529.519006] ret_from_fork+0x35/0x40
> [ 529.519006] udevadm D 0 544 1 0x00000000
> [ 529.519006] Call Trace:
> [ 529.519006] __schedule+0x292/0x880
> [ 529.519006] schedule+0x36/0x80
> [ 529.519006] schedule_timeout+0x1d5/0x2f0
> [ 529.519006] ? try_to_wake_up+0x4a/0x460
> [ 529.519006] ? try_to_wake_up+0x4a/0x460
> [ 529.519006] wait_for_completion+0xa5/0x110
> [ 529.519006] ? wake_up_q+0x80/0x80
> [ 529.519006] __flush_work.isra.29+0x119/0x1b0
> [ 529.519006] ? destroy_worker+0x90/0x90
> [ 529.519006] flush_delayed_work+0x3f/0x50
> [ 529.519006] fsnotify_wait_marks_destroyed+0x15/0x20
> [ 529.519006] fsnotify_destroy_group+0x4e/0xc0
> [ 529.519006] inotify_release+0x1e/0x50
> [ 529.519006] __fput+0xea/0x220
> [ 529.519006] ____fput+0xe/0x10
> [ 529.519006] task_work_run+0x8c/0xb0
> [ 529.519006] exit_to_usermode_loop+0x70/0xa9
> [ 529.519006] do_syscall_64+0x1b5/0x1e0
> [ 529.519006] entry_SYSCALL_64_after_hwframe+0x41/0xa6
> [ 529.519006] dhclient D 0 573 572 0x00000000
> [ 529.519006] Call Trace:
> [ 529.519006] __schedule+0x292/0x880
> [ 529.519006] schedule+0x36/0x80
> [ 529.519006] schedule_timeout+0x1d5/0x2f0
> [ 529.519006] ? aa_profile_af_perm+0xb4/0xf0
> [ 529.519006] wait_for_completion+0xa5/0x110
> [ 529.519006] ? wake_up_q+0x80/0x80
> [ 529.519006] __wait_rcu_gp+0x123/0x150
> [ 529.519006] synchronize_sched+0x4e/0x60
> [ 529.519006] ? __call_rcu+0x2f0/0x2f0
> [ 529.519006] ? trace_raw_output_rcu_utilization+0x50/0x50
> [ 529.519006] synchronize_net+0x1c/0x30
> [ 529.519006] __unregister_prot_hook+0xcd/0xf0
> [ 529.519006] packet_do_bind+0x1bd/0x250
> [ 529.519006] packet_bind+0x2f/0x50
> [ 529.519006] SYSC_bind+0xd8/0x110
> [ 529.519006] ? sock_alloc_file+0x91/0x130
> [ 529.519006] SyS_bind+0xe/0x10
> [ 529.519006] do_syscall_64+0x80/0x1e0
> [ 529.519006] entry_SYSCALL_64_after_hwframe+0x41/0xa6
> 
> * Bisecting mainline kernels, we found that v4.17-rc1 didn't reproduce the issue,
> whereas v4.16 reproduced. Then, a fine-grained git bisect led us to the fix - the
> following patch, when backported to a problematic version, fixes the issue:
> d8e462e19305 ("Drivers: hv: vmbus: Implement Direct Mode for stimer0")
> 
> * In Azure/Hyper-V, before the aforementioned commit, timer interrupts were passed
> to the hypervisor through a vmbus message, a mechanism of communication of hyper-v
> guests/hypervisor. With the patch, a check is made (through MSR-like mechanism)
> and if the hypervisor supports, a direct timer IRQ mechanism is put in-place
> instead of the vmbus channel.
> 
> * Our theory is that on kdump kernel, specially due to the single cpu nature,
> the vmbus-messaged timer IRQ could interfere with scheduling and create a dead-lock
> condition, which is what we observe from the stack traces. Hence, we hereby
> propose to backport such patch to azure 4.15 kernels.
> 
> * Our hypothesis about why this wasn't observed in Bionic is that changes in the
> kdump mechanism make it start earlier, and this might hide the schedule issue.
> Lack of testing also could be the cause of the non-observability.
> 
> [Test Case]
> 
> * Perform a kdump in Trusty/4.15 and after some rounds, you should see the
> mentioned RCU stack trace. With the patch, no issue is observed, and we could
> perform ~30 kdumps in a row.
> 
> [Where problems could occur]
> 
> * Since the patch touches IRQ code (although restricted to Hyper-v), it's a
> somewhat invasive change. It could lead to issues in the IRQ subsystem, although
> the patch is upstream since 2018 and no fixes for that were found (we count on
> Fixes tag for that).
> 
> 
> Michael Kelley (1):
>   Drivers: hv: vmbus: Implement Direct Mode for stimer0
> 
>  arch/x86/entry/entry_32.S          |  3 ++
>  arch/x86/entry/entry_64.S          |  3 ++
>  arch/x86/include/asm/hardirq.h     |  3 ++
>  arch/x86/include/asm/hyperv-tlfs.h |  3 ++
>  arch/x86/include/asm/irq_vectors.h |  3 +-
>  arch/x86/include/asm/mshyperv.h    | 13 +++++++
>  arch/x86/kernel/cpu/mshyperv.c     | 40 ++++++++++++++++++++
>  arch/x86/kernel/irq.c              |  7 ++++
>  drivers/hv/hv.c                    | 59 ++++++++++++++++++++++++++++--
>  drivers/hv/hyperv_vmbus.h          |  4 +-
>  10 files changed, 133 insertions(+), 5 deletions(-)
> 
> -- 
> 2.31.1
> 
> 
> -- 
> kernel-team mailing list
> kernel-team@lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team