mbox series

[SRU,J,0/1] UBUNTU: SAUCE: fan: release rcu_read_lock on skb discard path

Message ID 20240927130416.188167-1-aleksandr.mikhalitsyn@canonical.com
Headers show
Series UBUNTU: SAUCE: fan: release rcu_read_lock on skb discard path | expand

Message

Aleksandr Mikhalitsyn Sept. 27, 2024, 1:04 p.m. UTC
BugLink: https://bugs.launchpad.net/bugs/2064176

SRU Justification:

[Impact]

User can trigger a host crash on Jammy/Noble by launching
a container which uses Ubuntu FAN network in LXD.

Aug 30 21:51:57 v1 kernel: ------------[ cut here ]------------
Aug 30 21:51:57 v1 kernel: Voluntary context switch within RCU read-side critical section!
Aug 30 21:51:57 v1 kernel: WARNING: CPU: 1 PID: 2669 at kernel/rcu/tree_plugin.h:320 rcu_note_context_switch+0x2ce/0x2f0
Aug 30 21:51:57 v1 kernel: Modules linked in: veth vxlan ip6_udp_tunnel udp_tunnel dummy nft_masq nft_chain_nat bridge stp llc zfs(PO) spl(O) nvme_fabrics nvme_core nvme_auth ebtable_filter ebtables ip6table_raw ip6table_mangle ip6table_nat ip6table_filter ip6_tables iptable_raw iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter nf_tables libcrc32c vhost_vsock vhost vhost_iotlb binfmt_misc kvm_amd ccp kvm irqbypass crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha256_ssse3 sha1_ssse3 nls_iso8859_1 joydev aesni_intel crypto_simd cryptd virtio_gpu 9pnet_virtio virtio_dma_buf xhci_pci psmouse ahci 9pnet virtiofs libahci vmw_vsock_virtio_transport xhci_pci_renesas vmw_vsock_virtio_transport_common vsock virtio_input input_leds serio_raw efi_pstore nfnetlink dmi_sysfs virtio_rng ip_tables x_tables autofs4
Aug 30 21:51:57 v1 kernel: CPU: 1 PID: 2669 Comm: systemd-resolve Tainted: P           O       6.8.0-41-generic #41-Ubuntu
Aug 30 21:51:57 v1 kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009)/LXD, BIOS unknown 2/2/2022
Aug 30 21:51:57 v1 kernel: RIP: 0010:rcu_note_context_switch+0x2ce/0x2f0
Aug 30 21:51:57 v1 kernel: Code: fe ff ff ba 02 00 00 00 be 01 00 00 00 e8 fa d0 fe ff e9 6b fe ff ff 48 c7 c7 60 7d a6 a8 c6 05 ab 99 61 02 01 e8 d2 0d f2 ff <0f> 0b e9 96 fd ff ff 0f 0b e9 36 ff ff ff 0f 0b e9 18 ff ff ff 66
Aug 30 21:51:57 v1 kernel: RSP: 0018:ffffb611812bbd80 EFLAGS: 00010046
Aug 30 21:51:57 v1 kernel: RAX: 0000000000000000 RBX: ffff9613faeb5a00 RCX: 0000000000000000
Aug 30 21:51:57 v1 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Aug 30 21:51:57 v1 kernel: RBP: ffffb611812bbda0 R08: 0000000000000000 R09: 0000000000000000
Aug 30 21:51:57 v1 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
Aug 30 21:51:57 v1 kernel: R13: ffff9613b89dd200 R14: 0000000000000000 R15: 0000000000000000
Aug 30 21:51:57 v1 kernel: FS:  00007ec3a402c5c0(0000) GS:ffff9613fae80000(0000) knlGS:0000000000000000
Aug 30 21:51:57 v1 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 30 21:51:57 v1 kernel: CR2: 000062592dc892b8 CR3: 000000013890a000 CR4: 00000000007506f0
Aug 30 21:51:57 v1 kernel: PKRU: 55555554
Aug 30 21:51:57 v1 kernel: Call Trace:
Aug 30 21:51:57 v1 kernel:  <TASK>
Aug 30 21:51:57 v1 kernel:  ? show_regs+0x6d/0x80
Aug 30 21:51:57 v1 kernel:  ? __warn+0x89/0x160
Aug 30 21:51:57 v1 kernel:  ? rcu_note_context_switch+0x2ce/0x2f0
Aug 30 21:51:57 v1 kernel:  ? report_bug+0x17e/0x1b0
Aug 30 21:51:57 v1 kernel:  ? handle_bug+0x51/0xa0
Aug 30 21:51:57 v1 kernel:  ? exc_invalid_op+0x18/0x80
Aug 30 21:51:57 v1 kernel:  ? asm_exc_invalid_op+0x1b/0x20
Aug 30 21:51:57 v1 kernel:  ? rcu_note_context_switch+0x2ce/0x2f0
Aug 30 21:51:57 v1 kernel:  __schedule+0x81/0x6b0
Aug 30 21:51:57 v1 kernel:  schedule+0x33/0x110
Aug 30 21:51:57 v1 kernel:  syscall_exit_to_user_mode+0x22d/0x260
Aug 30 21:51:57 v1 kernel:  do_syscall_64+0x8c/0x180
Aug 30 21:51:57 v1 kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 30 21:51:57 v1 kernel:  ? syscall_exit_to_user_mode+0x89/0x260
Aug 30 21:51:57 v1 kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 30 21:51:57 v1 kernel:  ? do_syscall_64+0x8c/0x180
Aug 30 21:51:57 v1 kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 30 21:51:57 v1 kernel:  ? irqentry_exit_to_user_mode+0x7e/0x260
Aug 30 21:51:57 v1 kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 30 21:51:57 v1 kernel:  ? irqentry_exit+0x43/0x50
Aug 30 21:51:57 v1 kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Aug 30 21:51:57 v1 kernel:  ? exc_page_fault+0x94/0x1b0
Aug 30 21:51:57 v1 kernel:  entry_SYSCALL_64_after_hwframe+0x78/0x80
Aug 30 21:51:57 v1 kernel: RIP: 0033:0x7ec3a3f14887
Aug 30 21:51:57 v1 kernel: Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
Aug 30 21:51:57 v1 kernel: RSP: 002b:00007ffcbb32de08 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
Aug 30 21:51:57 v1 kernel: RAX: 000000000000002d RBX: 000062592dc882b0 RCX: 00007ec3a3f14887
Aug 30 21:51:57 v1 kernel: RDX: 000000000000002d RSI: 000062592dc88360 RDI: 0000000000000011
Aug 30 21:51:57 v1 kernel: RBP: 000062592dc7e690 R08: 00007ffcbb32dde4 R09: 0000000000000000
Aug 30 21:51:57 v1 kernel: R10: 00000000000005aa R11: 0000000000000246 R12: 0000000000000011
Aug 30 21:51:57 v1 kernel: R13: 0000000000000002 R14: 000000000000002d R15: 000062592dc88360
Aug 30 21:51:57 v1 kernel:  </TASK>
Aug 30 21:51:57 v1 kernel: ---[ end trace 0000000000000000 ]---

[Fix]

A first proposed patch fixes RCU locking by releasing rcu_read_lock
on the skb discard codepath.

Second patch just use a proper way (dev_core_stats_tx_dropped_inc() function)
to increase netdev's tx_dropped statistic value.

[Test Plan]

As provided by Max Asnaashari:

# Install LXD from channel latest/stable
snap install lxd --channel latest/stable

# Configure LXD
lxd init --auto

# Create a FAN network
lxc network create lxdfan0 bridge.mode=fan ipv4.nat=true

# Launch a container using the FAN network
lxc launch ubuntu-minimal:22.04 c1 --network lxdfan0

# Try to interact with LXD
lxc ls

[Where problems could occur]

Change is local and only related to Ubuntu FAN code. I would not expect
any problems with this patchset.

Link: https://github.com/canonical/lxd/issues/14025
Link: https://lists.ubuntu.com/archives/kernel-team/2024-September/153551.html
Reported-by: Max Asnaashari <max.asnaashari@canonical.com>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>

Alexander Mikhalitsyn (1):
  UBUNTU: SAUCE: fan: release rcu_read_lock on skb discard path

 drivers/net/vxlan/vxlan_core.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Guoqing Jiang Sept. 30, 2024, 2:18 a.m. UTC | #1
On 9/27/24 21:04, Alexander Mikhalitsyn wrote:
> BugLink: https://bugs.launchpad.net/bugs/2064176
>
> SRU Justification:
>
> [Impact]
>
> User can trigger a host crash on Jammy/Noble by launching
> a container which uses Ubuntu FAN network in LXD.

If the patch also applies for Noble, pls update the subject with 
"[J/N]". Otherwise looks you missed
to send patch for Noble.

Guoqing
Aleksandr Mikhalitsyn Sept. 30, 2024, 8:43 a.m. UTC | #2
On Mon, Sep 30, 2024 at 4:18 AM Guoqing Jiang
<guoqing.jiang@canonical.com> wrote:
>
>
>
> On 9/27/24 21:04, Alexander Mikhalitsyn wrote:
> > BugLink: https://bugs.launchpad.net/bugs/2064176
> >
> > SRU Justification:
> >
> > [Impact]
> >
> > User can trigger a host crash on Jammy/Noble by launching
> > a container which uses Ubuntu FAN network in LXD.
>
> If the patch also applies for Noble, pls update the subject with
> "[J/N]". Otherwise looks you missed
> to send patch for Noble.

Patch for Noble is here
https://lists.ubuntu.com/archives/kernel-team/2024-September/153551.html

Kind regards,
Alex

>
> Guoqing
Guoqing Jiang Sept. 30, 2024, 9:14 a.m. UTC | #3
On 9/30/24 16:43, Aleksandr Mikhalitsyn wrote:
> On Mon, Sep 30, 2024 at 4:18 AM Guoqing Jiang
> <guoqing.jiang@canonical.com> wrote:
>>
>>
>> On 9/27/24 21:04, Alexander Mikhalitsyn wrote:
>>> BugLink: https://bugs.launchpad.net/bugs/2064176
>>>
>>> SRU Justification:
>>>
>>> [Impact]
>>>
>>> User can trigger a host crash on Jammy/Noble by launching
>>> a container which uses Ubuntu FAN network in LXD.
>> If the patch also applies for Noble, pls update the subject with
>> "[J/N]". Otherwise looks you missed
>> to send patch for Noble.
> Patch for Noble is here
> https://lists.ubuntu.com/archives/kernel-team/2024-September/153551.html

Thanks for the link, I'd prefer to send it within one thread, but this 
works too.

Thanks,
Guoqing
Guoqing Jiang Sept. 30, 2024, 9:15 a.m. UTC | #4
Acked-by: Guoqing Jiang <guoqing.jiang@canonical.com>

On 9/27/24 21:04, Alexander Mikhalitsyn wrote:
> BugLink: https://bugs.launchpad.net/bugs/2064176
>
> SRU Justification:
>
> [Impact]
>
> User can trigger a host crash on Jammy/Noble by launching
> a container which uses Ubuntu FAN network in LXD.
>
> Aug 30 21:51:57 v1 kernel: ------------[ cut here ]------------
> Aug 30 21:51:57 v1 kernel: Voluntary context switch within RCU read-side critical section!
> Aug 30 21:51:57 v1 kernel: WARNING: CPU: 1 PID: 2669 at kernel/rcu/tree_plugin.h:320 rcu_note_context_switch+0x2ce/0x2f0
> Aug 30 21:51:57 v1 kernel: Modules linked in: veth vxlan ip6_udp_tunnel udp_tunnel dummy nft_masq nft_chain_nat bridge stp llc zfs(PO) spl(O) nvme_fabrics nvme_core nvme_auth ebtable_filter ebtables ip6table_raw ip6table_mangle ip6table_nat ip6table_filter ip6_tables iptable_raw iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter nf_tables libcrc32c vhost_vsock vhost vhost_iotlb binfmt_misc kvm_amd ccp kvm irqbypass crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha256_ssse3 sha1_ssse3 nls_iso8859_1 joydev aesni_intel crypto_simd cryptd virtio_gpu 9pnet_virtio virtio_dma_buf xhci_pci psmouse ahci 9pnet virtiofs libahci vmw_vsock_virtio_transport xhci_pci_renesas vmw_vsock_virtio_transport_common vsock virtio_input input_leds serio_raw efi_pstore nfnetlink dmi_sysfs virtio_rng ip_tables x_tables autofs4
> Aug 30 21:51:57 v1 kernel: CPU: 1 PID: 2669 Comm: systemd-resolve Tainted: P           O       6.8.0-41-generic #41-Ubuntu
> Aug 30 21:51:57 v1 kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009)/LXD, BIOS unknown 2/2/2022
> Aug 30 21:51:57 v1 kernel: RIP: 0010:rcu_note_context_switch+0x2ce/0x2f0
> Aug 30 21:51:57 v1 kernel: Code: fe ff ff ba 02 00 00 00 be 01 00 00 00 e8 fa d0 fe ff e9 6b fe ff ff 48 c7 c7 60 7d a6 a8 c6 05 ab 99 61 02 01 e8 d2 0d f2 ff <0f> 0b e9 96 fd ff ff 0f 0b e9 36 ff ff ff 0f 0b e9 18 ff ff ff 66
> Aug 30 21:51:57 v1 kernel: RSP: 0018:ffffb611812bbd80 EFLAGS: 00010046
> Aug 30 21:51:57 v1 kernel: RAX: 0000000000000000 RBX: ffff9613faeb5a00 RCX: 0000000000000000
> Aug 30 21:51:57 v1 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> Aug 30 21:51:57 v1 kernel: RBP: ffffb611812bbda0 R08: 0000000000000000 R09: 0000000000000000
> Aug 30 21:51:57 v1 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> Aug 30 21:51:57 v1 kernel: R13: ffff9613b89dd200 R14: 0000000000000000 R15: 0000000000000000
> Aug 30 21:51:57 v1 kernel: FS:  00007ec3a402c5c0(0000) GS:ffff9613fae80000(0000) knlGS:0000000000000000
> Aug 30 21:51:57 v1 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Aug 30 21:51:57 v1 kernel: CR2: 000062592dc892b8 CR3: 000000013890a000 CR4: 00000000007506f0
> Aug 30 21:51:57 v1 kernel: PKRU: 55555554
> Aug 30 21:51:57 v1 kernel: Call Trace:
> Aug 30 21:51:57 v1 kernel:  <TASK>
> Aug 30 21:51:57 v1 kernel:  ? show_regs+0x6d/0x80
> Aug 30 21:51:57 v1 kernel:  ? __warn+0x89/0x160
> Aug 30 21:51:57 v1 kernel:  ? rcu_note_context_switch+0x2ce/0x2f0
> Aug 30 21:51:57 v1 kernel:  ? report_bug+0x17e/0x1b0
> Aug 30 21:51:57 v1 kernel:  ? handle_bug+0x51/0xa0
> Aug 30 21:51:57 v1 kernel:  ? exc_invalid_op+0x18/0x80
> Aug 30 21:51:57 v1 kernel:  ? asm_exc_invalid_op+0x1b/0x20
> Aug 30 21:51:57 v1 kernel:  ? rcu_note_context_switch+0x2ce/0x2f0
> Aug 30 21:51:57 v1 kernel:  __schedule+0x81/0x6b0
> Aug 30 21:51:57 v1 kernel:  schedule+0x33/0x110
> Aug 30 21:51:57 v1 kernel:  syscall_exit_to_user_mode+0x22d/0x260
> Aug 30 21:51:57 v1 kernel:  do_syscall_64+0x8c/0x180
> Aug 30 21:51:57 v1 kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
> Aug 30 21:51:57 v1 kernel:  ? syscall_exit_to_user_mode+0x89/0x260
> Aug 30 21:51:57 v1 kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
> Aug 30 21:51:57 v1 kernel:  ? do_syscall_64+0x8c/0x180
> Aug 30 21:51:57 v1 kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
> Aug 30 21:51:57 v1 kernel:  ? irqentry_exit_to_user_mode+0x7e/0x260
> Aug 30 21:51:57 v1 kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
> Aug 30 21:51:57 v1 kernel:  ? irqentry_exit+0x43/0x50
> Aug 30 21:51:57 v1 kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
> Aug 30 21:51:57 v1 kernel:  ? exc_page_fault+0x94/0x1b0
> Aug 30 21:51:57 v1 kernel:  entry_SYSCALL_64_after_hwframe+0x78/0x80
> Aug 30 21:51:57 v1 kernel: RIP: 0033:0x7ec3a3f14887
> Aug 30 21:51:57 v1 kernel: Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
> Aug 30 21:51:57 v1 kernel: RSP: 002b:00007ffcbb32de08 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> Aug 30 21:51:57 v1 kernel: RAX: 000000000000002d RBX: 000062592dc882b0 RCX: 00007ec3a3f14887
> Aug 30 21:51:57 v1 kernel: RDX: 000000000000002d RSI: 000062592dc88360 RDI: 0000000000000011
> Aug 30 21:51:57 v1 kernel: RBP: 000062592dc7e690 R08: 00007ffcbb32dde4 R09: 0000000000000000
> Aug 30 21:51:57 v1 kernel: R10: 00000000000005aa R11: 0000000000000246 R12: 0000000000000011
> Aug 30 21:51:57 v1 kernel: R13: 0000000000000002 R14: 000000000000002d R15: 000062592dc88360
> Aug 30 21:51:57 v1 kernel:  </TASK>
> Aug 30 21:51:57 v1 kernel: ---[ end trace 0000000000000000 ]---
>
> [Fix]
>
> A first proposed patch fixes RCU locking by releasing rcu_read_lock
> on the skb discard codepath.
>
> Second patch just use a proper way (dev_core_stats_tx_dropped_inc() function)
> to increase netdev's tx_dropped statistic value.
>
> [Test Plan]
>
> As provided by Max Asnaashari:
>
> # Install LXD from channel latest/stable
> snap install lxd --channel latest/stable
>
> # Configure LXD
> lxd init --auto
>
> # Create a FAN network
> lxc network create lxdfan0 bridge.mode=fan ipv4.nat=true
>
> # Launch a container using the FAN network
> lxc launch ubuntu-minimal:22.04 c1 --network lxdfan0
>
> # Try to interact with LXD
> lxc ls
>
> [Where problems could occur]
>
> Change is local and only related to Ubuntu FAN code. I would not expect
> any problems with this patchset.
>
> Link: https://github.com/canonical/lxd/issues/14025
> Link: https://lists.ubuntu.com/archives/kernel-team/2024-September/153551.html
> Reported-by: Max Asnaashari <max.asnaashari@canonical.com>
> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
>
> Alexander Mikhalitsyn (1):
>    UBUNTU: SAUCE: fan: release rcu_read_lock on skb discard path
>
>   drivers/net/vxlan/vxlan_core.c | 1 +
>   1 file changed, 1 insertion(+)
>