Message ID | 20200905025218.45268-1-decui@microsoft.com |
---|---|
State | Changes Requested |
Delegated to: | David Miller |
Headers | show |
Series | [net] hv_netvsc: Fix hibernation for mlx5 VF driver | expand |
On Fri, 4 Sep 2020 19:52:18 -0700 Dexuan Cui wrote: > mlx5_suspend()/resume() keep the network interface, so during hibernation > netvsc_unregister_vf() and netvsc_register_vf() are not called, and hence > netvsc_resume() should call netvsc_vf_changed() to switch the data path > back to the VF after hibernation. Does suspending the system automatically switch back to the synthetic datapath? Please clarify this in the commit message and/or add a code comment. > Similarly, netvsc_suspend() should not call netvsc_unregister_vf(). > > BTW, mlx4_suspend()/resume() are differnt in that they destroy and > re-create the network device, so netvsc_register_vf() and > netvsc_unregister_vf() are automatically called. Note: mlx4 can also work > with the changes here because in netvsc_suspend()/resume() > ndev_ctx->vf_netdev is NULL for mlx4. > > Fixes: 0efeea5fb153 ("hv_netvsc: Add the support of hibernation") > Signed-off-by: Dexuan Cui <decui@microsoft.com> > --- > drivers/net/hyperv/netvsc_drv.c | 11 ++++++----- > 1 file changed, 6 insertions(+), 5 deletions(-) > > diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c > index 64b0a74c1523..f896059a9588 100644 > --- a/drivers/net/hyperv/netvsc_drv.c > +++ b/drivers/net/hyperv/netvsc_drv.c > @@ -2587,7 +2587,7 @@ static int netvsc_remove(struct hv_device *dev) > static int netvsc_suspend(struct hv_device *dev) > { > struct net_device_context *ndev_ctx; > - struct net_device *vf_netdev, *net; > + struct net_device *net; > struct netvsc_device *nvdev; > int ret; Please keep reverse xmas tree variable ordering. > @@ -2604,10 +2604,6 @@ static int netvsc_suspend(struct hv_device *dev) > goto out; > } > > - vf_netdev = rtnl_dereference(ndev_ctx->vf_netdev); > - if (vf_netdev) > - netvsc_unregister_vf(vf_netdev); > - > /* Save the current config info */ > ndev_ctx->saved_netvsc_dev_info = netvsc_devinfo_get(nvdev); > > @@ -2623,6 +2619,7 @@ static int netvsc_resume(struct hv_device *dev) > struct net_device *net = hv_get_drvdata(dev); > struct net_device_context *net_device_ctx; > struct netvsc_device_info *device_info; > + struct net_device *vf_netdev; > int ret; > > rtnl_lock(); > @@ -2635,6 +2632,10 @@ static int netvsc_resume(struct hv_device *dev) > netvsc_devinfo_put(device_info); > net_device_ctx->saved_netvsc_dev_info = NULL; > > + vf_netdev = rtnl_dereference(net_device_ctx->vf_netdev); > + if (vf_netdev && netvsc_vf_changed(vf_netdev) != NOTIFY_OK) > + ret = -EINVAL; Should you perhaps remove the VF in case of the failure? > rtnl_unlock(); > > return ret;
> From: Jakub Kicinski <kuba@kernel.org> > Sent: Saturday, September 5, 2020 4:27 PM > [...] > On Fri, 4 Sep 2020 19:52:18 -0700 Dexuan Cui wrote: > > mlx5_suspend()/resume() keep the network interface, so during hibernation > > netvsc_unregister_vf() and netvsc_register_vf() are not called, and hence > > netvsc_resume() should call netvsc_vf_changed() to switch the data path > > back to the VF after hibernation. > > Does suspending the system automatically switch back to the synthetic > datapath? Yes. For mlx4, since the VF network interafce is explicitly destroyed and re-created during hibernation (i.e. suspend + resume), hv_netvsc explicitly switches the data path from and to the VF. For mlx5, the VF network interface persists across hibernation, so there is no explicit switch-over, but after we close and re-open the vmbus channel of the netvsc NIC in netvsc_suspend() and netvsc_resume(), the data path is implicitly switched to the netvsc NIC, and with this patch netvsc_resume() -> netvsc_vf_changed() switches the data path back to the mlx5 NIC. > Please clarify this in the commit message and/or add a code > comment. I will add a comment in the commit message and the code. > > @@ -2587,7 +2587,7 @@ static int netvsc_remove(struct hv_device *dev) > > static int netvsc_suspend(struct hv_device *dev) > > { > > struct net_device_context *ndev_ctx; > > - struct net_device *vf_netdev, *net; > > + struct net_device *net; > > struct netvsc_device *nvdev; > > int ret; > > Please keep reverse xmas tree variable ordering. Will do. > > @@ -2635,6 +2632,10 @@ static int netvsc_resume(struct hv_device *dev) > > netvsc_devinfo_put(device_info); > > net_device_ctx->saved_netvsc_dev_info = NULL; > > > > + vf_netdev = rtnl_dereference(net_device_ctx->vf_netdev); > > + if (vf_netdev && netvsc_vf_changed(vf_netdev) != NOTIFY_OK) > > + ret = -EINVAL; > > Should you perhaps remove the VF in case of the failure? IMO this failure actually should not happen since we're resuming the netvsc NIC, so we're sure we have a valid pointer to the netvsc net device, and netvsc_vf_changed() should be able to find the netvsc pointer and return NOTIFY_OK. In case of a failure, something really bad must be happening, and I'm not sure if it's safe to simply remove the VF, so I just return -EINVAL for simplicity, since I believe the failure should not happen in practice. I would rather keep the code as-is, but I'm OK to add a WARN_ON(1) if you think that's necessary. Thanks, -- Dexuan
On Sun, 6 Sep 2020 03:05:48 +0000 Dexuan Cui wrote: > > > @@ -2635,6 +2632,10 @@ static int netvsc_resume(struct hv_device *dev) > > > netvsc_devinfo_put(device_info); > > > net_device_ctx->saved_netvsc_dev_info = NULL; > > > > > > + vf_netdev = rtnl_dereference(net_device_ctx->vf_netdev); > > > + if (vf_netdev && netvsc_vf_changed(vf_netdev) != NOTIFY_OK) > > > + ret = -EINVAL; > > > > Should you perhaps remove the VF in case of the failure? > IMO this failure actually should not happen since we're resuming the netvsc > NIC, so we're sure we have a valid pointer to the netvsc net device, and > netvsc_vf_changed() should be able to find the netvsc pointer and return > NOTIFY_OK. In case of a failure, something really bad must be happening, > and I'm not sure if it's safe to simply remove the VF, so I just return > -EINVAL for simplicity, since I believe the failure should not happen in practice. Okay, I see that the errors propagated by netvsc_vf_changed() aren't actually coming from netvsc_switch_datapath(), so you're right. The failures here won't be meaningful. > I would rather keep the code as-is, but I'm OK to add a WARN_ON(1) if you > think that's necessary. No need, I think core will complain when resume callback fails. That should be sufficient.
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c index 64b0a74c1523..f896059a9588 100644 --- a/drivers/net/hyperv/netvsc_drv.c +++ b/drivers/net/hyperv/netvsc_drv.c @@ -2587,7 +2587,7 @@ static int netvsc_remove(struct hv_device *dev) static int netvsc_suspend(struct hv_device *dev) { struct net_device_context *ndev_ctx; - struct net_device *vf_netdev, *net; + struct net_device *net; struct netvsc_device *nvdev; int ret; @@ -2604,10 +2604,6 @@ static int netvsc_suspend(struct hv_device *dev) goto out; } - vf_netdev = rtnl_dereference(ndev_ctx->vf_netdev); - if (vf_netdev) - netvsc_unregister_vf(vf_netdev); - /* Save the current config info */ ndev_ctx->saved_netvsc_dev_info = netvsc_devinfo_get(nvdev); @@ -2623,6 +2619,7 @@ static int netvsc_resume(struct hv_device *dev) struct net_device *net = hv_get_drvdata(dev); struct net_device_context *net_device_ctx; struct netvsc_device_info *device_info; + struct net_device *vf_netdev; int ret; rtnl_lock(); @@ -2635,6 +2632,10 @@ static int netvsc_resume(struct hv_device *dev) netvsc_devinfo_put(device_info); net_device_ctx->saved_netvsc_dev_info = NULL; + vf_netdev = rtnl_dereference(net_device_ctx->vf_netdev); + if (vf_netdev && netvsc_vf_changed(vf_netdev) != NOTIFY_OK) + ret = -EINVAL; + rtnl_unlock(); return ret;
mlx5_suspend()/resume() keep the network interface, so during hibernation netvsc_unregister_vf() and netvsc_register_vf() are not called, and hence netvsc_resume() should call netvsc_vf_changed() to switch the data path back to the VF after hibernation. Similarly, netvsc_suspend() should not call netvsc_unregister_vf(). BTW, mlx4_suspend()/resume() are differnt in that they destroy and re-create the network device, so netvsc_register_vf() and netvsc_unregister_vf() are automatically called. Note: mlx4 can also work with the changes here because in netvsc_suspend()/resume() ndev_ctx->vf_netdev is NULL for mlx4. Fixes: 0efeea5fb153 ("hv_netvsc: Add the support of hibernation") Signed-off-by: Dexuan Cui <decui@microsoft.com> --- drivers/net/hyperv/netvsc_drv.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-)