mbox series

[iwl-net,v1,0/4] fix locking issue

Message ID 20241105184859.741473-1-tarun.k.singh@intel.com
Headers show
Series fix locking issue | expand

Message

Tarun K Singh Nov. 5, 2024, 6:48 p.m. UTC
This series fix deadlock issues in the driver. The first patch changes
argument of function 'idpf_vport_ctrl_lock' to adapter. The second patch
renames 'vport_ctrl_lock' to 'vport_cfg_lock'. The first 2 patches make the
third patch easier to review. The third patch fixes the locking issue,
and the fourth patch prevents lockdep from complaining.

Ahmed Zaki (1):
  idpf: add lock class key

Tarun K Singh (3):
  idpf: Change function argument
  idpf: rename vport_ctrl_lock
  idpf: Add init, reinit, and deinit control lock

 drivers/net/ethernet/intel/idpf/idpf.h        | 49 ++++++++----
 .../net/ethernet/intel/idpf/idpf_ethtool.c    | 59 +++++++-------
 drivers/net/ethernet/intel/idpf/idpf_lib.c    | 80 +++++++++++++------
 drivers/net/ethernet/intel/idpf/idpf_main.c   | 39 ++++++---
 4 files changed, 149 insertions(+), 78 deletions(-)

Comments

Singh, Krishneil K Jan. 27, 2025, 6:06 a.m. UTC | #1
> -----Original Message-----
> From: Tarun K Singh <tarun.k.singh@intel.com>
> Sent: Tuesday, November 5, 2024 10:49 AM
> To: intel-wired-lan@lists.osuosl.org
> Cc: netdev@vger.kernel.org
> Subject: [PATCH iwl-net v1 0/4] fix locking issue
> 
> This series fix deadlock issues in the driver. The first patch changes
> argument of function 'idpf_vport_ctrl_lock' to adapter. The second patch
> renames 'vport_ctrl_lock' to 'vport_cfg_lock'. The first 2 patches make the
> third patch easier to review. The third patch fixes the locking issue,
> and the fourth patch prevents lockdep from complaining.
> 
> Ahmed Zaki (1):
>   idpf: add lock class key
> 
> Tarun K Singh (3):
>   idpf: Change function argument
>   idpf: rename vport_ctrl_lock
>   idpf: Add init, reinit, and deinit control lock
> 
>  drivers/net/ethernet/intel/idpf/idpf.h        | 49 ++++++++----
>  .../net/ethernet/intel/idpf/idpf_ethtool.c    | 59 +++++++-------
>  drivers/net/ethernet/intel/idpf/idpf_lib.c    | 80 +++++++++++++------
>  drivers/net/ethernet/intel/idpf/idpf_main.c   | 39 ++++++---
>  4 files changed, 149 insertions(+), 78 deletions(-)
> 
> --
> 2.46.0
> 

After additional testing on this [series|patch] we discovered an issue after doing ifdown while device is in a reset, which can result in a <null pointer dereference(?)>. 

kernel: idpf 0000:af:00.0: resetting
kernel: idpf 0000:af:00.0: reset done
kernel: idpf 0000:af:00.0: HW reset detected
kernel: idpf 0000:af:00.0: Device HW Reset initiated
kernel: BUG: kernel NULL pointer dereference, address: 00000000000001a0
kernel: #PF: supervisor write access in kernel mode
kernel: #PF: error_code(0x0002) - not-present page
kernel: PGD 0 P4D 0
kernel: Oops: Oops: 0002 [#1] PREEMPT SMP NOPTI
kernel: CPU: 77 UID: 0 PID: 303278 Comm: ip Not tainted 6.13.0-rc7-p18.g5879fe0+ #1
kernel: Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0015.032120220358 03/21/2022
kernel: RIP: 0010:mutex_lock+0x1c/0x30
kernel: Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 55 48 89 fd e8 ce e7 ff ff 65 48 8b 15 c6 7b 20 6f 31 c0 <f0> 48 0f b1 55 00 75 06 5d c3 cc cc cc cc 48 89 ef 5d eb b0 90 90
kernel: RSP: 0018:ffffb73e43907768 EFLAGS: 00010246
kernel: RAX: 0000000000000000 RBX: ffff8df786a37000 RCX: 0000000000000000
kernel: RDX: ffff8de876922f80 RSI: 00000000fffffe00 RDI: 00000000000001a0
kernel: RBP: 00000000000001a0 R08: ffffb73e439076f8 R09: 0000000000000000
kernel: R10: ffff8de823b5e000 R11: 0000000000000100 R12: 0000000000000000
kernel: R13: ffff8df786a37188 R14: 0000000000000001 R15: 0000000000000001
kernel: FS:  00007fe6f6a30740(0000) GS:ffff8df73fe80000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 00000000000001a0 CR3: 000000011bb58006 CR4: 00000000007726f0
kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
kernel: PKRU: 55555554
kernel: Call Trace:
kernel:  <TASK>
kernel:  ? __die+0x24/0x70
kernel:  ? page_fault_oops+0x158/0x4f0
kernel:  ? exc_page_fault+0x77/0x190
kernel:  ? asm_exc_page_fault+0x26/0x30
kernel:  ? mutex_lock+0x1c/0x30
kernel:  ? mutex_lock+0x12/0x30
kernel:  idpf_stop+0x39/0x60 [idpf]
kernel:  __dev_close_many+0xa9/0x150
kernel:  __dev_change_flags+0x114/0x250
kernel:  dev_change_flags+0x25/0x60
kernel:  do_setlink.constprop.0+0x215/0x1030
kernel:  ? avc_has_perm_noaudit+0x6b/0xf0
kernel:  ? cred_has_capability.isra.0+0x78/0x120
kernel:  ? security_capable+0x58/0x90
kernel:  rtnl_newlink+0x7c2/0xb90
kernel:  ? avc_has_perm_noaudit+0x6b/0xf0
kernel:  ? cred_has_capability.isra.0+0x78/0x120
kernel:  ? rtnl_lock_killable+0x3/0x20
kernel:  ? netdev_run_todo+0x6f/0x590
kernel:  ? __pfx_rtnl_newlink+0x10/0x10
kernel:  rtnetlink_rcv_msg+0x361/0x410
kernel:  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
kernel:  netlink_rcv_skb+0x58/0x110
kernel:  netlink_unicast+0x247/0x370
kernel:  netlink_sendmsg+0x1fa/0x440
kernel:  ____sys_sendmsg+0x3ac/0x3e0
kernel:  ? import_iovec+0x1a/0x20
kernel:  ? copy_msghdr_from_user+0x6d/0xa0
kernel:  ___sys_sendmsg+0x88/0xd0
kernel:  ? set_ptes.constprop.0+0x28/0x90
kernel:  ? __memcg_slab_free_hook+0xd9/0x120
kernel:  __sys_sendmsg+0x6c/0xc0
kernel:  do_syscall_64+0x64/0x170
kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
kernel: RIP: 0033:0x7fe6f6b5cc94
kernel: Code: 15 a1 41 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 80 3d 6d c9 0c 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 48 83 ec 28 89 54 24 1c 48 89
kernel: RSP: 002b:00007ffe89733418 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
kernel: RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fe6f6b5cc94
kernel: RDX: 0000000000000000 RSI: 00007ffe89733480 RDI: 0000000000000003
kernel: RBP: 0000000000000003 R08: 0000000067941450 R09: 00005560018342a0
kernel: R10: 00007fe6f6c21ce0 R11: 0000000000000202 R12: 0000000067941451
kernel: R13: 0000555fd0b8d040 R14: 0000000000000001 R15: 0000000000000000
kernel:  </TASK>
kernel: Modules linked in: kheaders idpf libeth cmac nls_utf8 cifs cifs_arc4 nls_ucs2_utils cifs_md4 dns_resolver netfs xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge stp llc rpcrdma intel_rapl_msr intel_rapl_common skx_edac skx_edac_common nfit rfkill libnvdimm overlay x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm rapl intel_cstate iTCO_wdt ipmi_ssif iTCO_vendor_support vfat fat acpi_power_meter i2c_i801 intel_uncore joydev ipmi_si acpi_ipmi lpc_ich pcspkr i2c_smbus ipmi_devintf ioatdma ipmi_msghandler mei_me acpi_pad mei intel_pch_thermal dca nfsd auth_rpcgss nfs_acl lockd grace sunrpc zram ast drm_client_lib crct10dif_pclmul i2c_algo_bit crc32_pclmul drm_shmem_helper drm_kms_helper ghash_clmulni_intel nvme sha512_ssse3 i40e sha256_ssse3 drm nvme_core sha1_ssse3 libie wmi sctp libcrc32c crc32c_intel ip6_udp_tunnel udp_tunnel fuse
kernel: CR2: 00000000000001a0
kernel: ---[ end trace 0000000000000000 ]---


Thank You 
Krishneil Singh
Przemek Kitszel Jan. 27, 2025, 8:11 a.m. UTC | #2
On 1/27/25 07:06, Singh, Krishneil K wrote:
> 
>> -----Original Message-----
>> From: Tarun K Singh <tarun.k.singh@intel.com>
>> Sent: Tuesday, November 5, 2024 10:49 AM
>> To: intel-wired-lan@lists.osuosl.org
>> Cc: netdev@vger.kernel.org
>> Subject: [PATCH iwl-net v1 0/4] fix locking issue
>>
>> This series fix deadlock issues in the driver. The first patch changes
>> argument of function 'idpf_vport_ctrl_lock' to adapter. The second patch
>> renames 'vport_ctrl_lock' to 'vport_cfg_lock'. The first 2 patches make the
>> third patch easier to review. The third patch fixes the locking issue,
>> and the fourth patch prevents lockdep from complaining.
>>
>> Ahmed Zaki (1):
>>    idpf: add lock class key
>>
>> Tarun K Singh (3):
>>    idpf: Change function argument
>>    idpf: rename vport_ctrl_lock
>>    idpf: Add init, reinit, and deinit control lock
>>
>>   drivers/net/ethernet/intel/idpf/idpf.h        | 49 ++++++++----
>>   .../net/ethernet/intel/idpf/idpf_ethtool.c    | 59 +++++++-------
>>   drivers/net/ethernet/intel/idpf/idpf_lib.c    | 80 +++++++++++++------
>>   drivers/net/ethernet/intel/idpf/idpf_main.c   | 39 ++++++---
>>   4 files changed, 149 insertions(+), 78 deletions(-)
>>
>> --
>> 2.46.0
>>
> 
> After additional testing on this [series|patch] we discovered an issue after doing ifdown while device is in a reset, which can result in a <null pointer dereference(?)>.

Thank you for the report,
Could you please check if this issue could be reproduced without the
series?

We have yet another patch by Emil, that aims to fix such issues by
detaching the device prior to the reset.

@netdev
I would like to consider adding "in reset" state for the netdev,
to prevent such behavior in more civilized way though.
Jakub Kicinski Jan. 27, 2025, 5:42 p.m. UTC | #3
On Mon, 27 Jan 2025 09:11:41 +0100 Przemek Kitszel wrote:
> @netdev
> I would like to consider adding "in reset" state for the netdev,
> to prevent such behavior in more civilized way though.

Can you provide more context? We already have netif_device_detach()
Przemek Kitszel Jan. 27, 2025, 8:55 p.m. UTC | #4
On 1/27/25 18:42, Jakub Kicinski wrote:
> On Mon, 27 Jan 2025 09:11:41 +0100 Przemek Kitszel wrote:
>> @netdev
>> I would like to consider adding "in reset" state for the netdev,
>> to prevent such behavior in more civilized way though.
> 
> Can you provide more context? We already have netif_device_detach()
> 

getting -ENODEV on the device/interface that will soon be up again feels
unoptimal, and with all the risks of "poor driver undoing user config"
during the disablement I would like to see better.

perhaps with Emil's patch there will be also more context