mbox series

[SRU,EOAN,0/1] Fix for LP:#1852663

Message ID 20191115035403.20173-1-gerald.yang@canonical.com
Headers show
Series Fix for LP:#1852663 | expand

Message

Gerald Yang Nov. 15, 2019, 3:54 a.m. UTC
BugLink: https://bugs.launchpad.net/bugs/1852663

[Impact]
Assign some VFs to VMs, when deleting VMs, a general protection fault occursin i40e_config_vf_promiscuous_mode

general protection fault: 0000 [#1] SMP PTI
CPU: 54 PID: 6200 Comm: libvirtd Not tainted 5.3.0-21-generic #22~18.04.1-UbuntuHardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 05/21/2019
RIP: 0010:i40e_config_vf_promiscuous_mode+0x172/0x330 [i40e]
Code: 48 8b 00 83 d1 00 48 85 c0 75 ef 49 83 c4 08 4c 39 e6 75 dd 85 c9 74 73 0f b6 45 c0 45 31 d2 89 45 d0 4d 8b 3e 4d 85 ff 74 53 <41> 0f b7 4f 16 66 81 f9 ff 0f 77 3f 0f b7 b3 ea 0c 00 00 8b 55 d0
RSP: 0018:ffffb987b5c77760 EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffff9bb5df5a9000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000006000000 RDI: ffff9bace4bce350
RBP: ffffb987b5c777b0 R08: 0000000000000000 R09: ffff9bace56a9da0
R10: 0000000000000000 R11: 0000000000000100 R12: ffff9bb5df5a9a28
R13: ffff9bace4bce008 R14: ffff9bb5df5a9338 R15: 26c2b975d54f5980
FS: 00007f9f07fff700(0000) GS:ffff9bfcff480000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa73c9c0e10 CR3: 000000f6ab37a002 CR4: 00000000007626e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
i40e_ndo_set_vf_port_vlan+0x1a2/0x440[i40e]
do_setlink+0x53f/0xee0
?update_load_avg+0x596/0x620
?update_curr+0x7a/0x1d0
?__switch_to_asm+0x40/0x70
?__switch_to_asm+0x34/0x70
?__switch_to_asm+0x40/0x70
?__switch_to_asm+0x34/0x70
rtnl_setlink+0x113/0x150
rtnetlink_rcv_msg+0x296/0x340
?aa_label_sk_perm.part.4+0x10f/0x160
?_cond_resched+0x19/0x40
?rtnl_calcit.isra.30+0x120/0x120
netlink_rcv_skb+0x51/0x120
rtnetlink_rcv+0x15/0x20
netlink_unicast+0x1a4/0x250
netlink_sendmsg+0x2d7/0x3d0
sock_sendmsg+0x63/0x70
___sys_sendmsg+0x2a9/0x320
?aa_label_sk_perm.part.4+0x10f/0x160
?_raw_spin_unlock_bh+0x1e/0x20
?release_sock+0x8f/0xa0
__sys_sendmsg+0x63/0xa0
?__sys_sendmsg+0x63/0xa0
__x64_sys_sendmsg+0x1f/0x30
do_syscall_64+0x5a/0x130
entry_SYSCALL_64_after_hwframe+0x44/0xa9

This issue also happens when deleting k8s pod if VF is used by k8s pod, there was a bug reported in the e1000-devel mailing list
https://sourceforge.net/p/e1000/mailman/message/36766306/

The fix is suggested by Billy McFall, to add protection when accessing the hash list(vsi->mac_filter_hash), but it's not upstream yet

[Test Case]
Spin up some VMs with VFs, then delete all VMs

[Regression Potential]
Low, the fix is to add a protection for a hash list, shouldn't have potential regression

Gerald Yang (1):
  UBUNTU: SAUCE: i40e: fix GPF when deleting VM

 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Khalid Elmously Nov. 22, 2019, 2:03 a.m. UTC | #1
On 2019-11-15 11:54:02 , Gerald Yang wrote:
> BugLink: https://bugs.launchpad.net/bugs/1852663
> 
> [Impact]
> Assign some VFs to VMs, when deleting VMs, a general protection fault occursin i40e_config_vf_promiscuous_mode
> 
> general protection fault: 0000 [#1] SMP PTI
> CPU: 54 PID: 6200 Comm: libvirtd Not tainted 5.3.0-21-generic #22~18.04.1-UbuntuHardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 05/21/2019
> RIP: 0010:i40e_config_vf_promiscuous_mode+0x172/0x330 [i40e]
> Code: 48 8b 00 83 d1 00 48 85 c0 75 ef 49 83 c4 08 4c 39 e6 75 dd 85 c9 74 73 0f b6 45 c0 45 31 d2 89 45 d0 4d 8b 3e 4d 85 ff 74 53 <41> 0f b7 4f 16 66 81 f9 ff 0f 77 3f 0f b7 b3 ea 0c 00 00 8b 55 d0
> RSP: 0018:ffffb987b5c77760 EFLAGS: 00010202
> RAX: 0000000000000000 RBX: ffff9bb5df5a9000 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000006000000 RDI: ffff9bace4bce350
> RBP: ffffb987b5c777b0 R08: 0000000000000000 R09: ffff9bace56a9da0
> R10: 0000000000000000 R11: 0000000000000100 R12: ffff9bb5df5a9a28
> R13: ffff9bace4bce008 R14: ffff9bb5df5a9338 R15: 26c2b975d54f5980
> FS: 00007f9f07fff700(0000) GS:ffff9bfcff480000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fa73c9c0e10 CR3: 000000f6ab37a002 CR4: 00000000007626e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> PKRU: 55555554
> Call Trace:
> i40e_ndo_set_vf_port_vlan+0x1a2/0x440[i40e]
> do_setlink+0x53f/0xee0
> ?update_load_avg+0x596/0x620
> ?update_curr+0x7a/0x1d0
> ?__switch_to_asm+0x40/0x70
> ?__switch_to_asm+0x34/0x70
> ?__switch_to_asm+0x40/0x70
> ?__switch_to_asm+0x34/0x70
> rtnl_setlink+0x113/0x150
> rtnetlink_rcv_msg+0x296/0x340
> ?aa_label_sk_perm.part.4+0x10f/0x160
> ?_cond_resched+0x19/0x40
> ?rtnl_calcit.isra.30+0x120/0x120
> netlink_rcv_skb+0x51/0x120
> rtnetlink_rcv+0x15/0x20
> netlink_unicast+0x1a4/0x250
> netlink_sendmsg+0x2d7/0x3d0
> sock_sendmsg+0x63/0x70
> ___sys_sendmsg+0x2a9/0x320
> ?aa_label_sk_perm.part.4+0x10f/0x160
> ?_raw_spin_unlock_bh+0x1e/0x20
> ?release_sock+0x8f/0xa0
> __sys_sendmsg+0x63/0xa0
> ?__sys_sendmsg+0x63/0xa0
> __x64_sys_sendmsg+0x1f/0x30
> do_syscall_64+0x5a/0x130
> entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> This issue also happens when deleting k8s pod if VF is used by k8s pod, there was a bug reported in the e1000-devel mailing list
> https://sourceforge.net/p/e1000/mailman/message/36766306/
> 
> The fix is suggested by Billy McFall, to add protection when accessing the hash list(vsi->mac_filter_hash), but it's not upstream yet
> 
> [Test Case]
> Spin up some VMs with VFs, then delete all VMs
> 
> [Regression Potential]
> Low, the fix is to add a protection for a hash list, shouldn't have potential regression
> 
> Gerald Yang (1):
>   UBUNTU: SAUCE: i40e: fix GPF when deleting VM
> 
>  drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> -- 
> 2.17.1
> 
> 
> -- 
> kernel-team mailing list
> kernel-team@lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team
Seth Forshee Dec. 6, 2019, 4:26 a.m. UTC | #2
On Fri, Nov 15, 2019 at 11:54:02AM +0800, Gerald Yang wrote:
> BugLink: https://bugs.launchpad.net/bugs/1852663
> 
> [Impact]
> Assign some VFs to VMs, when deleting VMs, a general protection fault occursin i40e_config_vf_promiscuous_mode
> 
> general protection fault: 0000 [#1] SMP PTI
> CPU: 54 PID: 6200 Comm: libvirtd Not tainted 5.3.0-21-generic #22~18.04.1-UbuntuHardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 05/21/2019
> RIP: 0010:i40e_config_vf_promiscuous_mode+0x172/0x330 [i40e]
> Code: 48 8b 00 83 d1 00 48 85 c0 75 ef 49 83 c4 08 4c 39 e6 75 dd 85 c9 74 73 0f b6 45 c0 45 31 d2 89 45 d0 4d 8b 3e 4d 85 ff 74 53 <41> 0f b7 4f 16 66 81 f9 ff 0f 77 3f 0f b7 b3 ea 0c 00 00 8b 55 d0
> RSP: 0018:ffffb987b5c77760 EFLAGS: 00010202
> RAX: 0000000000000000 RBX: ffff9bb5df5a9000 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000006000000 RDI: ffff9bace4bce350
> RBP: ffffb987b5c777b0 R08: 0000000000000000 R09: ffff9bace56a9da0
> R10: 0000000000000000 R11: 0000000000000100 R12: ffff9bb5df5a9a28
> R13: ffff9bace4bce008 R14: ffff9bb5df5a9338 R15: 26c2b975d54f5980
> FS: 00007f9f07fff700(0000) GS:ffff9bfcff480000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fa73c9c0e10 CR3: 000000f6ab37a002 CR4: 00000000007626e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> PKRU: 55555554
> Call Trace:
> i40e_ndo_set_vf_port_vlan+0x1a2/0x440[i40e]
> do_setlink+0x53f/0xee0
> ?update_load_avg+0x596/0x620
> ?update_curr+0x7a/0x1d0
> ?__switch_to_asm+0x40/0x70
> ?__switch_to_asm+0x34/0x70
> ?__switch_to_asm+0x40/0x70
> ?__switch_to_asm+0x34/0x70
> rtnl_setlink+0x113/0x150
> rtnetlink_rcv_msg+0x296/0x340
> ?aa_label_sk_perm.part.4+0x10f/0x160
> ?_cond_resched+0x19/0x40
> ?rtnl_calcit.isra.30+0x120/0x120
> netlink_rcv_skb+0x51/0x120
> rtnetlink_rcv+0x15/0x20
> netlink_unicast+0x1a4/0x250
> netlink_sendmsg+0x2d7/0x3d0
> sock_sendmsg+0x63/0x70
> ___sys_sendmsg+0x2a9/0x320
> ?aa_label_sk_perm.part.4+0x10f/0x160
> ?_raw_spin_unlock_bh+0x1e/0x20
> ?release_sock+0x8f/0xa0
> __sys_sendmsg+0x63/0xa0
> ?__sys_sendmsg+0x63/0xa0
> __x64_sys_sendmsg+0x1f/0x30
> do_syscall_64+0x5a/0x130
> entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> This issue also happens when deleting k8s pod if VF is used by k8s pod, there was a bug reported in the e1000-devel mailing list
> https://sourceforge.net/p/e1000/mailman/message/36766306/
> 
> The fix is suggested by Billy McFall, to add protection when accessing the hash list(vsi->mac_filter_hash), but it's not upstream yet
> 
> [Test Case]
> Spin up some VMs with VFs, then delete all VMs
> 
> [Regression Potential]
> Low, the fix is to add a protection for a hash list, shouldn't have potential regression

Applied to unstable/master, thanks!