Message ID | 20241022173527.87972-3-pavan.kumar.linga@intel.com |
---|---|
State | Superseded |
Headers | show |
Series | fix reset issues | expand |
On Tue, Oct 22, 2024 at 10:35:27AM -0700, Pavan Kumar Linga wrote: > In an event where the platform running the device control plane > is rebooted, reset is detected on the driver. It releases > all the resources and waits for the reset to complete. Once the > reset is done, it tries to build the resources back. At this > time if the device control plane is not yet started, then > the driver timeouts on the virtchnl message and retries to > establish the mailbox again. > > In the retry flow, mailbox is deinitialized but the mailbox > workqueue is still alive and polling for the mailbox message. > This results in accessing the released control queue leading to > null-ptr-deref. Fix it by unrolling the work queue cancellation > and mailbox deinitialization in the order which they got > initialized. > > Also remove the redundant scheduling of the mailbox task in > idpf_vc_core_init. I think it might be better to move this last change into a separate patch targeted at iwl rather than iwl-net. It isn't a fix, right? > > Fixes: 4930fbf419a7 ("idpf: add core init and interrupt request") > Fixes: 34c21fa894a1 ("idpf: implement virtchnl transaction manager") > Cc: stable@vger.kernel.org # 6.9+ > Reviewed-by: Tarun K Singh <tarun.k.singh@intel.com> > Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> ...
On 10/24/2024 7:39 AM, Simon Horman wrote: > On Tue, Oct 22, 2024 at 10:35:27AM -0700, Pavan Kumar Linga wrote: >> In an event where the platform running the device control plane >> is rebooted, reset is detected on the driver. It releases >> all the resources and waits for the reset to complete. Once the >> reset is done, it tries to build the resources back. At this >> time if the device control plane is not yet started, then >> the driver timeouts on the virtchnl message and retries to >> establish the mailbox again. >> >> In the retry flow, mailbox is deinitialized but the mailbox >> workqueue is still alive and polling for the mailbox message. >> This results in accessing the released control queue leading to >> null-ptr-deref. Fix it by unrolling the work queue cancellation >> and mailbox deinitialization in the order which they got >> initialized. >> >> Also remove the redundant scheduling of the mailbox task in >> idpf_vc_core_init. > > I think it might be better to move this last change into a separate patch > targeted at iwl rather than iwl-net. It isn't a fix, right? > If we do not make that change in this patch, it results in calling cancel_delayed_work_sync twice in case of an error in idpf_vc_core_init (err_intr_req). Looks like there aren't any side effects of calling cancel_delayed_work_sync twice. I will remove the said changes from this patch. Thanks, Pavan >> >> Fixes: 4930fbf419a7 ("idpf: add core init and interrupt request") >> Fixes: 34c21fa894a1 ("idpf: implement virtchnl transaction manager") >> Cc: stable@vger.kernel.org # 6.9+ >> Reviewed-by: Tarun K Singh <tarun.k.singh@intel.com> >> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com> > > ...
diff --git a/drivers/net/ethernet/intel/idpf/idpf_lib.c b/drivers/net/ethernet/intel/idpf/idpf_lib.c index c3848e10e7db..b4fbb99bfad2 100644 --- a/drivers/net/ethernet/intel/idpf/idpf_lib.c +++ b/drivers/net/ethernet/intel/idpf/idpf_lib.c @@ -1786,6 +1786,7 @@ static int idpf_init_hard_reset(struct idpf_adapter *adapter) */ err = idpf_vc_core_init(adapter); if (err) { + cancel_delayed_work_sync(&adapter->mbx_task); idpf_deinit_dflt_mbx(adapter); goto unlock_mutex; } diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c index 3be883726b87..d77d6c3805e2 100644 --- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c +++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c @@ -3017,11 +3017,6 @@ int idpf_vc_core_init(struct idpf_adapter *adapter) goto err_netdev_alloc; } - /* Start the mailbox task before requesting vectors. This will ensure - * vector information response from mailbox is handled - */ - queue_delayed_work(adapter->mbx_wq, &adapter->mbx_task, 0); - queue_delayed_work(adapter->serv_wq, &adapter->serv_task, msecs_to_jiffies(5 * (adapter->pdev->devfn & 0x07))); @@ -3046,7 +3041,6 @@ int idpf_vc_core_init(struct idpf_adapter *adapter) err_intr_req: cancel_delayed_work_sync(&adapter->serv_task); - cancel_delayed_work_sync(&adapter->mbx_task); idpf_vport_params_buf_rel(adapter); err_netdev_alloc: kfree(adapter->vports); @@ -3070,7 +3064,6 @@ int idpf_vc_core_init(struct idpf_adapter *adapter) adapter->state = __IDPF_VER_CHECK; if (adapter->vcxn_mngr) idpf_vc_xn_shutdown(adapter->vcxn_mngr); - idpf_deinit_dflt_mbx(adapter); set_bit(IDPF_HR_DRV_LOAD, adapter->flags); queue_delayed_work(adapter->vc_event_wq, &adapter->vc_event_task, msecs_to_jiffies(task_delay));