Message ID | 1476885181-3456-1-git-send-email-vkuznets@redhat.com |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
Stephen Hemminger <sthemmin@microsoft.com> writes: > Do we need ACCESS_ONCE() here to avoid check/use issues? > I think we don't: this is the only place in the function where we read the variable so we'll get normal read. We're not trying to syncronize with netvsc_init_buf() as that would require locking, if we read stale NULL value after it was already updated on a different CPU we're fine, we'll just return -EAGAIN. > -----Original Message----- > From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com] > Sent: Wednesday, October 19, 2016 2:53 PM > To: netdev@vger.kernel.org > Cc: Stephen Hemminger <sthemmin@microsoft.com>; devel@linuxdriverproject.org; linux-kernel@vger.kernel.org; KY Srinivasan <kys@microsoft.com>; Haiyang Zhang <haiyangz@microsoft.com> > Subject: [PATCH net-next] hv_netvsc: fix a race between netvsc_send() and netvsc_init_buf() > > Fix in commit 880988348270 ("hv_netvsc: set nvdev link after populating > chn_table") turns out to be incomplete. A crash in > netvsc_get_next_send_section() is observed on mtu change when the device is under load. The race I identified is: if we get to netvsc_send() after we set net_device_ctx->nvdev link in netvsc_device_add() but before we finish netvsc_connect_vsp()->netvsc_init_buf() send_section_map is not allocated and we crash. Unfortunately we can't set net_device_ctx->nvdev link after the netvsc_init_buf() call as during the negotiation we need to receive packets and on the receive path we check for it. It would probably be possible to split nvdev into a pair of nvdev_in and nvdev_out links and check them accordingly in get_outbound_net_device()/ > get_inbound_net_device() but this looks like an overkill. > > Check that send_section_map is allocated in netvsc_send(). > > Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> > --- > drivers/net/hyperv/netvsc.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c index 720b5fa..e2bfaac 100644 > --- a/drivers/net/hyperv/netvsc.c > +++ b/drivers/net/hyperv/netvsc.c > @@ -888,6 +888,13 @@ int netvsc_send(struct hv_device *device, > if (!net_device) > return -ENODEV; > > + /* We may race with netvsc_connect_vsp()/netvsc_init_buf() and get > + * here before the negotiation with the host is finished and > + * send_section_map may not be allocated yet. > + */ > + if (!net_device->send_section_map) > + return -EAGAIN; > + > out_channel = net_device->chn_table[q_idx]; > > packet->send_buf_index = NETVSC_INVALID_INDEX; > -- > 2.7.4
From: Vitaly Kuznetsov <vkuznets@redhat.com> Date: Thu, 20 Oct 2016 10:51:04 +0200 > Stephen Hemminger <sthemmin@microsoft.com> writes: > >> Do we need ACCESS_ONCE() here to avoid check/use issues? >> > > I think we don't: this is the only place in the function where we read > the variable so we'll get normal read. We're not trying to syncronize > with netvsc_init_buf() as that would require locking, if we read stale > NULL value after it was already updated on a different CPU we're fine, > we'll just return -EAGAIN. The concern is if we race with netvsc_destroy_buf() and this pointer becomes NULL after the test you are adding.
David Miller <davem@davemloft.net> writes: > From: Vitaly Kuznetsov <vkuznets@redhat.com> > Date: Thu, 20 Oct 2016 10:51:04 +0200 > >> Stephen Hemminger <sthemmin@microsoft.com> writes: >> >>> Do we need ACCESS_ONCE() here to avoid check/use issues? >>> >> >> I think we don't: this is the only place in the function where we read >> the variable so we'll get normal read. We're not trying to syncronize >> with netvsc_init_buf() as that would require locking, if we read stale >> NULL value after it was already updated on a different CPU we're fine, >> we'll just return -EAGAIN. > > The concern is if we race with netvsc_destroy_buf() and this pointer > becomes NULL after the test you are adding. Thanks, this is interesting. We may reach to netvsc_destroy_buf() by 3 pathes: 1) cleanup path in netvsc_init_buf(). It is never called with send_section_map being not NULL so it seems we're safe. 2) from netvsc_remove() when the device is being removed. As far as I understand we can't be on the transmit path after we call unregister_netdev() so we're safe. 3) from netvsc_change_mtu() and netvsc_set_channels(). These pathes are specific to netvsc driver as basically we need to remove the device and add it back to change mtu/number of channels. The underligning 'struct net_device' stays but the rest is being removed and added back. On both pathes we first call netvsc_close() which does netif_tx_disable() and as far as I understand (I may be wrong) this guarantees that we won't be in netvsc_send(). So *I think* that we can't ever free send_section_map while in netvsc_send() and we can't even get to netvsc_send() after it is freed but I may have missed something.
From: Vitaly Kuznetsov <vkuznets@redhat.com> Date: Fri, 21 Oct 2016 13:15:53 +0200 > David Miller <davem@davemloft.net> writes: > >> From: Vitaly Kuznetsov <vkuznets@redhat.com> >> Date: Thu, 20 Oct 2016 10:51:04 +0200 >> >>> Stephen Hemminger <sthemmin@microsoft.com> writes: >>> >>>> Do we need ACCESS_ONCE() here to avoid check/use issues? >>>> >>> >>> I think we don't: this is the only place in the function where we read >>> the variable so we'll get normal read. We're not trying to syncronize >>> with netvsc_init_buf() as that would require locking, if we read stale >>> NULL value after it was already updated on a different CPU we're fine, >>> we'll just return -EAGAIN. >> >> The concern is if we race with netvsc_destroy_buf() and this pointer >> becomes NULL after the test you are adding. > > Thanks, this is interesting. > > We may reach to netvsc_destroy_buf() by 3 pathes: > > 1) cleanup path in netvsc_init_buf(). It is never called with > send_section_map being not NULL so it seems we're safe. > > 2) from netvsc_remove() when the device is being removed. As far as I > understand we can't be on the transmit path after we call > unregister_netdev() so we're safe. > > 3) from netvsc_change_mtu() and netvsc_set_channels(). These pathes are > specific to netvsc driver as basically we need to remove the device and > add it back to change mtu/number of channels. The underligning 'struct > net_device' stays but the rest is being removed and added back. On both > pathes we first call netvsc_close() which does netif_tx_disable() and as > far as I understand (I may be wrong) this guarantees that we won't be in > netvsc_send(). > > So *I think* that we can't ever free send_section_map while in > netvsc_send() and we can't even get to netvsc_send() after it is freed > but I may have missed something. Ok you may be right. Can't the device be taken down by asynchronous events as well? For example if the peer end of the interface in the other guest disappears.
David Miller <davem@davemloft.net> writes: > From: Vitaly Kuznetsov <vkuznets@redhat.com> > Date: Fri, 21 Oct 2016 13:15:53 +0200 > >> David Miller <davem@davemloft.net> writes: >> >>> From: Vitaly Kuznetsov <vkuznets@redhat.com> >>> Date: Thu, 20 Oct 2016 10:51:04 +0200 >>> >>>> Stephen Hemminger <sthemmin@microsoft.com> writes: >>>> >>>>> Do we need ACCESS_ONCE() here to avoid check/use issues? >>>>> >>>> >>>> I think we don't: this is the only place in the function where we read >>>> the variable so we'll get normal read. We're not trying to syncronize >>>> with netvsc_init_buf() as that would require locking, if we read stale >>>> NULL value after it was already updated on a different CPU we're fine, >>>> we'll just return -EAGAIN. >>> >>> The concern is if we race with netvsc_destroy_buf() and this pointer >>> becomes NULL after the test you are adding. >> >> Thanks, this is interesting. >> >> We may reach to netvsc_destroy_buf() by 3 pathes: >> >> 1) cleanup path in netvsc_init_buf(). It is never called with >> send_section_map being not NULL so it seems we're safe. >> >> 2) from netvsc_remove() when the device is being removed. As far as I >> understand we can't be on the transmit path after we call >> unregister_netdev() so we're safe. >> >> 3) from netvsc_change_mtu() and netvsc_set_channels(). These pathes are >> specific to netvsc driver as basically we need to remove the device and >> add it back to change mtu/number of channels. The underligning 'struct >> net_device' stays but the rest is being removed and added back. On both >> pathes we first call netvsc_close() which does netif_tx_disable() and as >> far as I understand (I may be wrong) this guarantees that we won't be in >> netvsc_send(). >> >> So *I think* that we can't ever free send_section_map while in >> netvsc_send() and we can't even get to netvsc_send() after it is freed >> but I may have missed something. > > Ok you may be right. > > Can't the device be taken down by asynchronous events as well? For example > if the peer end of the interface in the other guest disappears. The device may be hot removed from host's side. In this case we follow the following call chain: ... -> vmbus_device_unregister() -> ... -> vmbus_remove() -> netvsc_remove() and netvsc_remove() does netif_tx_disable(); unregister_netdev(); before calling rndis_filter_device_remove() leading to netvsc_destroy_buf(). So it seems we can't be in netvsc_send() when the device is disappearing.
From: Vitaly Kuznetsov <vkuznets@redhat.com> Date: Fri, 21 Oct 2016 17:17:18 +0200 > David Miller <davem@davemloft.net> writes: > >> From: Vitaly Kuznetsov <vkuznets@redhat.com> >> Date: Fri, 21 Oct 2016 13:15:53 +0200 >> >>> David Miller <davem@davemloft.net> writes: >>> >>>> From: Vitaly Kuznetsov <vkuznets@redhat.com> >>>> Date: Thu, 20 Oct 2016 10:51:04 +0200 >>>> >>>>> Stephen Hemminger <sthemmin@microsoft.com> writes: >>>>> >>>>>> Do we need ACCESS_ONCE() here to avoid check/use issues? >>>>>> >>>>> >>>>> I think we don't: this is the only place in the function where we read >>>>> the variable so we'll get normal read. We're not trying to syncronize >>>>> with netvsc_init_buf() as that would require locking, if we read stale >>>>> NULL value after it was already updated on a different CPU we're fine, >>>>> we'll just return -EAGAIN. >>>> >>>> The concern is if we race with netvsc_destroy_buf() and this pointer >>>> becomes NULL after the test you are adding. >>> >>> Thanks, this is interesting. >>> >>> We may reach to netvsc_destroy_buf() by 3 pathes: >>> >>> 1) cleanup path in netvsc_init_buf(). It is never called with >>> send_section_map being not NULL so it seems we're safe. >>> >>> 2) from netvsc_remove() when the device is being removed. As far as I >>> understand we can't be on the transmit path after we call >>> unregister_netdev() so we're safe. >>> >>> 3) from netvsc_change_mtu() and netvsc_set_channels(). These pathes are >>> specific to netvsc driver as basically we need to remove the device and >>> add it back to change mtu/number of channels. The underligning 'struct >>> net_device' stays but the rest is being removed and added back. On both >>> pathes we first call netvsc_close() which does netif_tx_disable() and as >>> far as I understand (I may be wrong) this guarantees that we won't be in >>> netvsc_send(). >>> >>> So *I think* that we can't ever free send_section_map while in >>> netvsc_send() and we can't even get to netvsc_send() after it is freed >>> but I may have missed something. >> >> Ok you may be right. >> >> Can't the device be taken down by asynchronous events as well? For example >> if the peer end of the interface in the other guest disappears. > > The device may be hot removed from host's side. In this case we follow > the following call chain: > > ... -> vmbus_device_unregister() -> ... -> vmbus_remove() -> netvsc_remove() > > and netvsc_remove() does netif_tx_disable(); unregister_netdev(); > before calling rndis_filter_device_remove() leading to netvsc_destroy_buf(). > > So it seems we can't be in netvsc_send() when the device is > disappearing. Ok, it all looks good then.
From: Vitaly Kuznetsov <vkuznets@redhat.com> Date: Wed, 19 Oct 2016 15:53:01 +0200 > Fix in commit 880988348270 ("hv_netvsc: set nvdev link after populating > chn_table") turns out to be incomplete. A crash in > netvsc_get_next_send_section() is observed on mtu change when the device > is under load. The race I identified is: if we get to netvsc_send() after > we set net_device_ctx->nvdev link in netvsc_device_add() but before we > finish netvsc_connect_vsp()->netvsc_init_buf() send_section_map is not > allocated and we crash. Unfortunately we can't set net_device_ctx->nvdev > link after the netvsc_init_buf() call as during the negotiation we need > to receive packets and on the receive path we check for it. It would > probably be possible to split nvdev into a pair of nvdev_in and nvdev_out > links and check them accordingly in get_outbound_net_device()/ > get_inbound_net_device() but this looks like an overkill. > > Check that send_section_map is allocated in netvsc_send(). > > Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Applied, thanks.
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c index 720b5fa..e2bfaac 100644 --- a/drivers/net/hyperv/netvsc.c +++ b/drivers/net/hyperv/netvsc.c @@ -888,6 +888,13 @@ int netvsc_send(struct hv_device *device, if (!net_device) return -ENODEV; + /* We may race with netvsc_connect_vsp()/netvsc_init_buf() and get + * here before the negotiation with the host is finished and + * send_section_map may not be allocated yet. + */ + if (!net_device->send_section_map) + return -EAGAIN; + out_channel = net_device->chn_table[q_idx]; packet->send_buf_index = NETVSC_INVALID_INDEX;
Fix in commit 880988348270 ("hv_netvsc: set nvdev link after populating chn_table") turns out to be incomplete. A crash in netvsc_get_next_send_section() is observed on mtu change when the device is under load. The race I identified is: if we get to netvsc_send() after we set net_device_ctx->nvdev link in netvsc_device_add() but before we finish netvsc_connect_vsp()->netvsc_init_buf() send_section_map is not allocated and we crash. Unfortunately we can't set net_device_ctx->nvdev link after the netvsc_init_buf() call as during the negotiation we need to receive packets and on the receive path we check for it. It would probably be possible to split nvdev into a pair of nvdev_in and nvdev_out links and check them accordingly in get_outbound_net_device()/ get_inbound_net_device() but this looks like an overkill. Check that send_section_map is allocated in netvsc_send(). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> --- drivers/net/hyperv/netvsc.c | 7 +++++++ 1 file changed, 7 insertions(+)